Semantic Layers and You: How to Trust Your Data

Transcript
Speaking of five after, we have hit that time. So I want to say thank you all for joining us today on our webinar, which is Semantic Layers and You and How to Trust Your Data. So hopefully by the end of this, you will have a little bit more understanding of what semantic layers are right now, where they are going, how to implement them, and just kind of general, like what are the pros and cons of them as well. My name is Rachel Kurtz. I am a senior analytics architect here at InterWorks. I myself am based in Raleigh, North Carolina, have been an East Coast girl pretty much my entire life, living in New York City for a while and Boston for a while, but family in North Carolina, so here I am again. I have a background in data science, data preparation, data engineering, all things data at this point. I am joined today by my colleague Kim, so I'll give her a minute to introduce herself. Hi everyone, my name is Kim Aagaard. I am first on every list, yes, when it comes to alphabetical order by last name. I am an analytics architect here at InterWorks working with Rachel on a lot of our analytics programs. I have a background in Tableau as well as general analytics and data analysis, but I also have a background in foreign policy and international trade. So I've really, really enjoyed my slow career pivot over the years, but have been really enjoying getting to do a deep dive in analytics with InterWorks. Am, as mentioned earlier, I'm based out of Portland, Oregon, but I've actually lived in Washington DC for many years. So I've lived on both coasts of the US, but I'm happy to be back on the West Coast and avoiding all of the snow on the East Coast as we've seen. I will be here as monitoring the chat. So if you've got any questions, please do feel free to throw those into the chat and I will make sure that Rachel gets those. So go ahead, rather than raising your hand, we won't have the capability for you to unmute on this call today. So appreciate it. Yeah, your questions into chat or Q and A, whichever feels more comfortable for you, and we will go from there. So yeah, what we are doing today, semantic layers and you, how to trust your data. Semantic layers are probably something that at this point you have all heard as a buzz phrase, buzzword, just buzz, buzz, buzz everywhere about semantic layers. You may have been thinking, well how do I help create them? What is the point of them? Anything like that. So hopefully you'll have that kind of information coming out of this. So the first thing that I want to talk on today is what is the problem that semantic layers is even trying to deal with, trying to work through? So I like to just summarize that as basically just saying data chaos, which I'm sure is probably something that we all have faced over the many, many years of working in data analytics, data science, data engineering, whatever it might be, is just the chaos that happens around all of that data. So we have our data layer, right? This is where probably all of you have at least spent some time in. You have your raw data, you have your data transformations, and all that works together to create these data models. Sometimes that's in Snowflake, sometimes that's in Databricks, that's in AWS, it's in BigQuery, whatever it might be. This is where your data is living and this is where it is pretty much in your raw and slightly transformed state. Next, you have your consumption layer. This is where a lot of us spend our time actually consuming and understanding the data, trying to get the insights into the data that we've been given, moving away from thinking on like, right, just looking at an Excel table and trying to understand what's going on and being able to actually look at a bar chart, something as simple as that, and to being able to actually consume the data. And our consumption layer can come in many, many formats. The most common one that I think we've all dealt with over the years is BI tools, right? This is our Tableau, our Sigma, our Power BI, actually looking at reports and interacting with the data, being able to filter it, those sort of things. That's how we're able to consume it. There's also web applications, right? Being able to actually interact with it within a web application, creating that so that vendors or your clients or yourself can deal with it in a more interactive way. And then finally, APIs and agents, which is becoming even more like it used to be, I feel like at this point, if you look at these three pillars or these three little bubbles, BI tools would be seventy five percent of how people are consuming the data and API agents was maybe like five percent, APIs and agents were like five percent. That proportion is clearly changing and is rapidly, rapidly becoming so. So just want to talk that these are all the different ways that people are consuming the data from the data chaos that is the data models. But then we might have the question of what was revenue in June? So that might be a question that I have, and I want to look at my data and actually understand what was my revenue in June. So we're going to say that we have our data, all of our data, into a whole bunch of snowflake tables. I have one group within my company who when they say what was revenue in June, they've got a report that looks at this, and they've decided to calculate revenue as Stripe sales or sales plus the coupon value. And that's great. And so I look at their report, that's what revenue is saying. However, I have a second group that is also calculating revenue, but they're subtracting returns from their sales and adding the coupon value. So that's what's happening in theirs. I have another group. Theirs is sales plus coupon value, now they're bringing sales tax into it. I have another one that is doing returns and sales tax and coupon values. And then I have a final one that for some reason or another has a base number of thirty two thousand one hundred and sixty nine that they like to add into it. The problem is in this moment, and you probably have all faced this, is I now have five separate reports that are all saying revenue, and they are all giving me different numbers. And I have no real way when I'm just looking at the report to know how they're being calculated. Unless somebody has decided to put these equations somewhere in the report for me to look at, I have no way of knowing that. So this is kind of the data chaos that we're talking about. Everybody's trying to define a metric in their own way, but there's no real way to know how it's all being defined unless I download the Tableau workbook saying that we're using Tableau. Download the Tableau workbook, go into the equation, and understand that. So this is a graph that is coming from the twenty twenty five State of Analytics Engineering, which is a survey that DBT puts out every year to understand how people are interacting with data, some of the problems that they're facing, and some of the future of analytics. So this one was on the questions, what do you find most challenging while preparing data for analysis? And you can see here, obviously, were allowed to select up to three, so these together will not add up to a hundred. But the big ones that people are talking about is, one is poor data quality. And that has been an issue, I think, since the beginning of data. Poor data quality has been a problem that we've been facing. The number three one that I wanna also point out is this poor stakeholder data literacy. So not only is the data quality not great, but people don't really understand what they're looking at. They don't really understand what's going on. And then finally is number five is this lack of trust in data from stakeholders. And a lot of that is coming from that data chaos. They're seeing five different reports that are saying they're giving you the same thing and they're all different numbers. So how can you trust that data? And these are some of the ways that people are trying to move towards semantic layers to help combat some of these issues and some of these problems that they're facing. Another stat from that survey was which of the following best describes how you spend most of your time and almost sixty percent say maintaining or organizing data sets. Like I said earlier, I have a background in data science myself, which means even in grad school people were telling me that as a data scientist, eighty percent of your time will be actually data engineering or data wrangling. So this is still true and is true of all data analytics at this point. So maintaining or organizing data sets has become a really difficult thing, especially if you're having to reinvent the wheel every single time you want to do revenue. If I'm having to rewrite that calculation every single time, those are like small things that can just add up to really spending a lot of time in data sets that could be used in the actual insight gathering or investigation into the data. So let's talk about what is a semantic layer. Now that we have a better understanding of what the problem is, let's talk about what is a semantic layer and how can it help with that problem. So this is a slightly older definition of what a semantic layer is, and we'll get into that of how this is changing as we're progressing within technology. So what is it? The concept is a standalone semantic layer is a universal business logic abstraction generally decoupled from any single BI tool. And again, we'll talk on that. But it provides a consistent, governed interface for analytics. It's defining those business friendly terms, those metrics. It's revenue, active users, conversion, whatever it might be, regardless of the underlying data structure, everyone knows that they're using the same definition for revenue, for active users, for conversion. A more succinct way to put that is define once, use everywhere. So that's generally what I try to think of with semantic layer. That way not everybody's having to create the same calculations. Not everyone's having to think on how should I calculate this? How do I join these together? That kind of thing. Define once, use it everywhere. That's a quick little tagline for semantic layers for me. So we're going to bring this back before the data chaos that reigns supreme amongst all of this. Now there's this in between layer, the semantic layer, if you will, that goes in between our data layer and our consumption layer that helps give a lot of this information, gives the semantics and the context. It has the metrics definitions, it gives the user context, it gives the definitions, it gives instructions on how to utilize it. So this is where we start to try to find some guidance in the chaos. So while we talk about what it is, I would also like to discuss what it is not. So the first thing to talk about what it is not, it is not a data warehouse or a storage solution. This is not replacing Snowflake. This is not replacing Databricks. This is sitting with it, working with it. It is also not a replacement for your existing data infrastructure, very similar. It is also not a tool that physically moves or duplicates data. This isn't going to be your Fivetrains or your Matillions. This is just, again, kind of sitting on top of where the data is living and helping be that translation between where your data is living and how you're consuming the data, whether that's reports or AIs, APIs, or anything like that. It's also not just another reporting layer. Similarly, where it's not replacing your data layer, it's not replacing your consumption layer. It's not intended to build the report. It's not intended to build the AI agents or work with like, it's meant to be that in between. And finally, it's not a substitute for data modeling or database design. You still need your bronze, silver, gold medallion. That still needs to exist. You still need to work with that. So this isn't to replace any of that. This is to help augment it and create it so that we're giving a little bit more self-service, which is the phrase that always another buzz phrase, giving you self-service to your analysts, to your users. So let's talk about some of the pillars and the benefits of the semantic layer now that we know what it is and what it is not. So semantic layer, unified business metrics, standardized definitions of key metrics. I'm saying some of these things multiple times, but it's because they are very, very important when it comes to that, and we like to say things a few times. So it's definitions of key metrics to ensure consistent reporting and analysis. That's a big one. That's kind of like one of the main points of semantic layers. Also some metadata management, documentation of upstream data sources, transformations, and context. So it's not just defining the metrics, but it's giving that documentation and those definitions to the people who are using it to try to combat that one in that one graph that we're looking at, number three, where it's data literacy was a hard issue that people were facing, to try to work through that. Unlimited integrations. Obviously, there are a lot of different ways to talk about semantic layers and where semantic layers live. There are the standalone ones, such as DBT, and again, we'll talk on this in a little bit. There's a lot of unlimited integrations when it comes to that from enabling connections to Tableau, Power BI, Sigma, other business apps, AI agents that can consume the semantic layer. So they can then take in that. So it's not just users, but it's also other things like that. Performance optimization. This allows a lot of things so I can make sure that I'm caching, I'm optimizing my queries and other techniques to deliver fast response times. So I'm not creating a way in which I'm bringing in all of these different fields that I don't actually need within a semantic layer. I can specify which ones. Right? I don't necessarily always need within Snowflake the, like, EDW update time. That may not be something I need, so I just don't bring that in. So it helps really optimize how your performance is going. Data governance, which is another big buzzword we're talking about when it comes to data. It's how we're talking about how you can trust your data and understand that is that it is governed. There's policies for data quality, access controls, compliance, all of those things to make sure that within the semantic layer, we trust what we're looking at. And when we're consuming it through reports or APIs or agents, then that we know what we're looking at. And finally, it's a hybrid data foundation. It's a way of just trying to bring in all of this data chaos into maybe not necessarily a single, but like a lot fewer of these places that you can go to to get to your answers. Rachel, we've got a few questions in the chat here. All right. We've actually got two questions both from Dwal and Scott about who would be the owner of the semantic layer. Who's defining the calculations that we're putting into this semantic layer and those definitions and, you know, especially thinking towards mid sized to large organizations where, you know, org structure may be a question. I mean, I think I'm going to give a very typical consulting answer of it kind of depends on what your structure looks like. Like you were just saying, Ken, the type of the org structure, the size of the structure, it kind of depends. And it also looks at where do you want it to be and how do you want that process to be put in place. The biggest thing is to think of like making sure that's a consistent process. Some of this that we're hoping to utilize this, when it comes to semantic layers is making it so that the data engineering team, which has up until this point been in charge of, right, the data process and things like that, it's making sure that they're not bogged down by every single question on this. So it is trying to give some control into hands of the analysts. I think a lot of times, especially with, I mean, at this point, it is synonymous with DBT. Semantic layers are a big thing within DBT. They're not the only ones, obviously, but analytics engineers within a company will often be the ones that are kind of trying to live in that in between world between data engineers and data analysts or users. The biggest thing to think on is the process and how you want to go about adding in metrics. How do you want to go about creating that governance around there? We have had clients where it's the data engineering team that's in charge of the semantic It's we've had some clients where there's a subset of the analysts who are almost like admin analysts, and they're the ones in charge of of maintaining those semantic models. So it really kinda depends on what you are looking for as long as you're consistent in it and it's a governed way to go about doing it and everyone's clear on it. We have had some clients where it's the wild west and everybody can mess with it. Generally, don't suggest that one. You should really, think, as Christopher said, having data stewards is a necessity. You need somebody who is, like there is somebody who is in charge of it and everybody knows that to be true. And having that stakeholder buy in so that you don't have people going, working around, creating a workaround on the system. You want to make sure that everyone is working together and agreeing upon those definitions. Yes. That ties into a question actually really well from Erin, which is asking about have you run into situations where people tend to resist sharing or assuming the same definition of metrics with other functions or departments? Not personally, but I can see where that does happen. So a lot of things, and this has been a shift even in the last year of the way people think about semantic layers and semantic models, it used to be the idea of there being one layer to rule them all, for lack of a better phrase, Lord of Rings will always come back, but it's more that you want to because that becomes a difficult thing to maintain, a difficult thing to scale, and so people will continue, like you were just saying, Kim, continue to try to find hacks and workarounds to get to what they want. So finding slightly more, like, smaller ways of doing it within there, so if it is something where, like, within a company, different departments have a harder time sharing some of these calculations across them, then it may be that there are semantic layers for each of them, and it's clear different ones for, different departments or different functions within there, that this is the HR function uses this semantic model. This is, you know, the marketing, they use this semantic model, whatever it might be, and that makes it a little bit more sustainable and people feel a little bit more comfortable in what they are able to control and share within that. Great. I think maybe one more question and then I don't want to hold you up too much Rachel. Yeah, then we'll keep going. We've got a question from Rachel Morris. Is a semantic layer by necessity always a live pull and never a snapshot? And if so, are there challenges with response times in many architectures? Oh, that is also a very good question. Semantic layers, as they have been in the past, have generally always had to have been live to an extent, but I think that there are options to just also create snapshots within there. Basically, it's trying to create just I mean, if you think of if you look at the definition of a semantic layer and try to, like, and squint your eyes a little bit, right, like a Tableau a published Tableau data source could possibly be considered something like that. So that could be an extract of sorts. So the biggest thing to think on with that, challenges with response times in many architectures. I feel like that's such a deeper question, and I appreciate it, but I think that might be a little bit it it's so it depends on what your data architecture is, what's involved within your semantic layers, and how you're consuming the data, then it would be hard to give, like, a hard and fast rule on that. With that, I think we will move on. Please keep sending in your questions. Absolutely. We will also save time at the end. Like answering as we're going, but we'll also save some time at the end to answer any that I have missed so far. Yep. I know we've got a couple, but we'll answer those as we go along. Alright. Some of the things that I want to talk about with semantic layers are the cost benefits. So we're talking about the pillars of the benefits, and now let's talk about some of the cost benefits specifically. So one of them, this is kind of similar to the question you're asking, Rachel, but it does reduce cloud computing costs in general. You will replace redundant queries and reworking if you have the semantic model. Again, you don't have to have you won't have five Tableau data sources sources that are running extract refreshes that are all basically doing the same thing. You can just utilize one semantic layer to use that. Efficient query processing and reduced data movement. Again, similarly, you aren't taking it from one and then creating a version of it five different ways and having those be stored somewhere else. Increased productivity and reduced manual labor. Again, this is kind of like a cost benefit. Accelerated time to market. Again, if you give access to these sort of things to the semantic layer, you're empowering self-service analytics. That's the second bullet. And it's making it so that I am able to create these insights, create these data science models, create these reports, and get those out there faster because I'm not having to do the heavy lifting of making sure I have the calculations correctly and and going and making sure that everybody approves the way I'm calculating something. Right? All of that heavy lifting has been done already. And then less pipeline maintenance. We don't have to see all of the different places that something has been defined to be able to then update it. And finally, improved operational efficiency, similar to what I was just saying, accelerated time to insight. A big thing about, a lot of people are talking about semantic layers right now is to make it use, like, AI usability, making so that our data is a little bit more available. There's more context. There's more understanding for AI. The very stereotypical phrase that people use a lot with AI is dirty data in, dirty data out. So the data still needs to be good and ready and we need to trust it to go into our AI in order to then trust what the outcome is going to be. And then finally, consistent metrics equals consistent language. This will help cut through any time wasted of making sure that we're all talking about the same thing when we're talking about active users, whether that's somebody who's been signed on in the last thirty days or the last year and a half. Right? Like, as long as we say active means this, then we're kinda cutting through that time of trying to make sure that we're all aligned on it. So let's talk about a real quick life, real life example. So here we go. We've got our data warehouse. We've got our bronze, silver, gold. We've got a couple of different tables within there, Inventory360, Sales360, Customer360. Those are our gold ready to go business layer. Then we've got our Tableau published data sources that are referencing those, and then we've got a couple calculated fields, average order value and profitability per product that are within some of those data sources. Then I have a question. What if I want to build a new metric in Sales three sixty? What's gonna go on there? So what ends up having to happen is I need to go to my data engineering team because they are in control of the data mart or the gold layer. I need to go to them within their sprint. I need to make sure that we can add in this metric. So that'll take a couple of hours. Hopefully, can get in on this sprint, I don't have to wait two weeks for the next sprint because they are doing this for everybody in our company. Then once it gets down to that, it's updated, it's within sales three sixty. Now I have to go to my published data source and I have to refresh that and update that. I add that as a one hour update data source. Honestly, that's obviously a slight overestimate, but if you are used to Tableau, if you update something, you break something is basically the way it goes. If I update anything, a color is going to change, some format's going to change, and I'm gonna have to fix it. So I usually like to overestimate that to about an hour because a lot of times that will end up happening. And then I do end up having to update my workbook level LOD. We'll say that takes about three hours because of the different ways of calculating things now with my new metric within my workbook. And then I can also say what if I now want to share my profitability data with my vendors? So again, same update that I've got there, but now I need to build a new app that's going to take about forty hours to make sure I'm connecting to the data correctly, doing all the calculations within there, and then also updating my REST API. I'm just giving some guesstimates of how much, how long that would take. So that's a pretty long time to kind of update these things. Now with our semantic layer, what we can do is, especially if we're providing this ability to people that are not just the data engineering have to be within that sprint, we can say that code updates within about an hour. That semantic layer is where I'm defining my metrics and a lot of other things about how I can interact with it. I can pre aggregate complex metrics. I can define who can use these metrics and how. All of this happens within semantic layer. So once I have that updated in my code and I push that, that's ready to go. I don't really have to go in and do everything because I've done the calculations, I've done the metric calculations within my semantic layer. So I don't have to go in and change my LODs within my workbooks. I don't have to update the data source because that's just it's just there now. So saved a bunch. Obviously, there's still some hours that you have to do with the REST API and the web apps. We do have a couple of questions from Steven in the q and a that tie into this asking about what should be really defined in the semantic layer versus saved for that BI reporting layer. Are there actually certain types of calculations that aren't supported by semantic layers? And then he actually had a follow-up question asking about if there's the potential for dynamic inputs from your BI tools, like a parameter, let's say, the last seven days informing your semantic layer. So I'm gonna start with the first questions of what metrics should be calculated within the semantic layers. I think pretty much at this point, the goal is, ideally, most of those metrics that are gonna be aggregated or that are doing a lot of those calculations that will be utilized across many different reports, many different functions, whatever it might be, that should be in the semantic layer. It should be. I'm talking ideal state, obviously, knowing that a lot of times ideal state is not how things end up coming to be. You can also within a lot of those, you can see in the second bullet point, you can define correct grain for metrics and relevant dimensions. Right? So you can actually specify within there how can I do these aggregations? Do I wanna do it by only these three dimensions? They're the only way I can do this aggregation. Right? Information like that can happen within the semantic layer, so it's helping to just guide people on how they can use the data so somebody's not accidentally aggregating something across a dimension that it doesn't make sense or aggregating something that doesn't make sense at all. I'll use the example of that I always used when I was teaching Tableau years ago, zip codes. Zip codes are technically a number, but you can't do an average of it, things like that. So it's like actually defining things like that, how people can interact with the data is within the semantic layer. The final part of that question, remind me again, Kim. Yeah. The parameter or things like parameters and other fields interacting with and things like that. That is another it depends on where your semantic layer lives and what tool you're using for your semantic layer. So we'll talk a little bit about on, like, what where the different semantic layers can live and what tools you can use, and that will change how you're able to interact with it and what kind of, like, back and forth you can have with it. Great. We we do have one other shifting a little bit, but maybe broadening broadening. One other question that came through from E, asking about how do you balance interaction and customization requirements with a loss in data trust across individual experiences and the business customer spectrum? Is part of the transparency around that regarding introducing the role of the semantic layer? And I reread that. How do you balance interaction and customization requirements with a loss in data trust across individual experience across the business customer spectrum as part of the transparency around that introducing the role of semantic layer? I believe so let me actually I would like to get back to that conversation, that question, and might actually respond to that. I want to make sure I'm understanding it correctly, so let's save that one for some time at the end, and if not, we can always respond to you individually, and give you our answer to that. So the next thing I want to talk on is implementation strategy when it comes to a semantic layer. So steps to implement a semantic layer and moving from pilot to enterprise wide adoption. And like I was saying a little bit before, this is talking about basically the idea is not necessarily to come up with one semantic model for your entire enterprise. That would be really, really, really tough to maintain, and the user adoption would be absolutely abysmal. So we're talking about trying to get it so that how do we actually utilize this within your company, even within your function? So the first start is identifying some high value metrics. Start small with a few business critical metrics, sales, revenue, churn, whatever it might be, conversion, things like that. Then you want to choose the right technology, and this is what I was just kind of mentioning a little bit before. This depends on where you are. Select a semantic layer platform that integrates with your BI tool or your AI workflows. So know where you are, what is it that you're utilizing within your company, within your function, and make sure the semantic layer platform that you want to use can interact with that, which most of them can at this point, but still a call out that needs to be made. The major ones that I do wanna talk about are DBT, right, and Cube. Those are the big standalones that are kinda like the major ones when we're talking about semantic layers. But I would be remiss to not talk about the fact that there are BI tools that also have semantic layers within them, Tableau, Sigma, with their data models, they may not necessarily be called semantic layers, but for all intents and purposes, that's what the intent of them is, is to create this governed data model that you can then interact with, where metrics are defined, calculations are defined. This is not an exhaustive list. This is just some of the big players right now. If you look at this slide within a month, this may change. This is a very volatile, not volatile, but very Dynamic. Thank you. This is a very dynamic situation that we're in right now. So this will constantly be changing. Things will be added on. So once you've chosen your right technology for where you are, then you can define and govern your metrics, document the business logic, make sure everybody is trusting it, everything's consistent, get some buy in with that. Then finally, scale and iterate. Rinse and repeat. Expand gradually to metrics, more tools, more teams. And then again, this is part four of that as well. Realize when you need to be done with that semantic layer and be like, this is our semantic layer. There may be another one that we're working on too that's then how this part's going to be reacting to it. So just to kind of show you for those of you who have not seen anything like this before, and by demo I mean I'm showing you some screenshots of how it's being used just so we can make sure that we save time for some questions and go a little bit more into this. This is what a semantic layer will look like within DBT. You have your models, which are kind of like your tables on the left hand side and your blue. Those are your DIMM and FACT tables. Then on our red, we have our semantic layers, and then the yellow are the metrics that are within there. And you can see here, right, I've got order total and order counts that are in kind of that fourth column, I guess, in yellow, and those two together make average order value. Then I have order count large orders that together make percent of orders that are large. Within this, how I define those metrics is I can say, Okay, order count. I've given it a name, order count. I gave it a description. Again, we're talking about making sure that there's this data literacy within there. People understand what they're looking at. I gave it the very interesting name of number of orders. And then I can then say, what is actual parameters around that, and what is the measure that I'm looking at. Again, it's order count. So this really didn't do too much heavy lifting. This is taking order count, creating a column called order count within my semantic layer. Here's where we get a little bit more where the semantic layer is coming into play. I now have a new metric within there that's called large orders, where that's also using order count. You can see under measured order count, but at the bottom you see there's a filter. I've decided that large orders, and I put it in my description as well, are count of orders with an order total over twenty. And again, this is where I need to make sure that I'm putting that information in so that everybody who's looking at large orders from here on out knows that I meant by twenty, greater than or equal to twenty. Maybe we decide that it should be thirty. This is where we would come and change that. So it's creating that filter. And then finally, I'm able to create another metric based off of those two that's large orders divided by order count that gives me my percent of orders that are large. So this is just where I can start defining this information, creating the descriptions, and knowing that I'm going be consistent with this and everybody's going know what I'm talking about. So then I want to connect to that within Tableau, so I'm able to collect to just the semantic layer within Tableau and create this nice view that's looking at my sum of percent of orders, or my percent of orders that are large. And here, as you all may know if you've used Tableau, you can right click on something and change the type of aggregation that you're looking at. I can then change see that I actually am not able to do that because within a semantic layer, particularly within DBT, but in other ones as well, you can change how someone can interact with your metrics. So you can change whether they're able to do averages, sums, whatever it might be. You can actually change how somebody interacts with that and control. Guide is a better word that I like to use other than control. So you can guide people on how best to interact with your metrics. So I'm going to talk on some future trends and wrap it up, and then we can go on any questions that we have that are left over or any new questions that come from my next couple of slides. So the one thing I want to talk about is the evolution of semantic layers. Like I was saying on a previous slide, the tools that I showed is like choose the right tooling for you. That has changed since the last time I presented this four months ago, and it will change if I present this again in four months. So there has been a constant evolution of this. So I just want to talk on how we have evolved in thinking on semantic layers. So originally, this was a question earlier of like who owns and controls the semantic layer. Originally, was data teams. The data engineering team was the one in charge of the semantic layer, maintaining it, creating the metrics, defining everything. And it was originally primarily for BI consistency, right, making sure that every report that looks at revenue was looking at the same revenue. And oftentimes, originally, it was separate from user tools. Right? We're talking about DBT. We're talking about Cube. It wasn't within the BI tools. But what's happening now is a lot of this is moving more towards the user. So DBT is still really, really great for semantic layers. Cube is still there. But there are starting to be some more semantic layers that are being embedded in BI tools, such as Sigma, Power BI, Omni. Again, they may not be called semantic layers that way, but they're data models. They're whatever you might call it, DAX within Power BI. So it's talking about how are we actually doing this in a governed way that then is able to be shared with others where I'm defining metrics. I'm just finding how people can interact with my raw data. We're getting a rise of hybrid architectures, and we're starting to get an increased focus on usability, not just governance. So I can document until the cows come home, but if it's not in a way that anybody can actually utilize or is using it, again, trying to avoid that one semantic layer to rule everything, I want people to actually be able to use it and not feel like they have to hack their way around it. And then finally, where are we heading? There we're heading towards some more AI native semantic layers, powering LLMs and agents. All of this is created already within here. You're able to use these semantic layers to help make sure that the AI that you're working with, the agents that you're working with have the context and the understanding and the metrics to work on it, but it's moving towards even more AI native semantic layers and also semantic layers as a data product interface. So we're heading towards that way. Another thing to call out in the where we're heading as we're going through all of this I have been talking about is traditional semantic layers focused on metrics and aggregations built on tables and joints, and it's great for dashboards and AI. And this is what we're talking about. So this is the traditional semantic layers. And even within your BI tools, embedded in BI tools, this is what we're mostly talking about. But there has also been, and something to think on as we're moving more and more towards this AI world, is graph based semantic models. What is that doing? It's modeling data as entities and relationships, and it's doing a little bit more on the capturing of context and reasoning between these different things, between these different entities. So it's even more suited for AI and complex questions. So traditional semantic layers are doing a lot of the heavy lifting and doing really, really great and great for AI and great for reports and great for consumption. Graph based semantic models is just another thing I wanna call out as something to understand as we're moving more and more towards agentic AI. The real future is not one or the other. It's a hybrid of the two of them. And so finally, we're going wrap it up, and then we'll look at some questions. So this is to wrap it up. Sorry about the title. That should be Wrap Up, not Steps to Implement a Semantic Layer. We already went over that. So what is the point of semantic layers? They're a good source of truth, not a single source of truth, but a good source of truth. Semantic layers unify definitions across BI and AI tools, ensuring consistent answers everywhere. They're a great foundation for AI because if you have trusted data fueling your AI, then you're going to trust what's coming out of the AI even more. It's good for governance and trust. Obviously, I'm saying trust a lot, but that is a big part of this. Centralized message definitions build trust and reduce compliance risks. And finally, cost and efficiency. You really are going eliminate rework, streamline operations, reduce infrastructure strain, just make people feel a lot better about the day to day. They're not spending eighty percent of their time, like some data scientists do, wrangling with their data. So start small, iterate, and scale your semantic layer journey with this information. So that's a wrap for what I wanted to present. So Kim, let's talk on some questions that we may We've got a bunch more, yeah. I love it. Let's go. Alright. Let's let's keep them coming in the chat, and and we'll try I'll try my best to to group everything as I can to to keep these themes themes going. Like, for example, we've had a few questions on specific platforms for semantic layers. For example, Colleen has been asking about resources within the Microsoft platform if they are using Power BI. Let's say, if they're already fully in on the Microsoft platform, what what, technology resources would would they have? You've mentioned Power BI has its own semantic layers. It has its, like it's its own dot data modeling within it that you can generally create within there. That one will be something we probably want to investigate together to determine, like, what that says. I do not have an answer for you on that, I am sorry, in this moment, but please reach out and we can discuss that a little bit further. Absolutely. Jack in the q and a was asking about Sigma and asking about the write back features of Sigma. Are semantic layers, an area where that write back feature in Sigma can come into play? Can users write back to a semantic model in the same way, let's say, they can write back to a cloud data warehouse? I think it it is very similar into what I was talking about earlier. It kinda depends on the actual tooling within there. Most semantic layers, the intent of them is to be static in a way that is consistent. So having a user be able to write back to it is not necessarily keeping in the vein of what semantic layers are meant to do. They're meant to be a little bit more locked down for that. So the write back feature is not necessarily something that would come into play within a semantic layer from my understanding. Great. And then Ian asked in the q and a about, and this is this is a particular agreement that, he had heard discussed at DBT conference where, companies had agreed to standardize the connectivity between their tools. So have you heard anything, Rachel, about that promise being fulfilled or being in progress? I think that is something that is in progress. And I was also at the DBT conference and heard them talk on that. That was, I mean, it feels like silly to say, but that was only six months ago. And so to try to get that kind of change across such huge enterprises would be a little bit more, takes a little bit more time. But I think it is something that is in progress at this point, but I don't think there's anything definitive that has been done that I have seen, at this. So if anybody has seen anything differently, please put that into the chat or in the Q and A, and we can share that with, the email that comes out after this as well. Great. We've got some similar questions from Steven Deval in the Oh, sorry, Steven and Monty in the chat. There's more things coming through, this is great. Just asking about how end users might be able to see what's in the semantic layer. Will they be able to query that direct, let's say, a DBT semantic layer outside of the BI end end tool, is that possible, or will they just need to consume that only from that BI tool? So you can generally query it. Again, there are so many different toolings at this point. It kind of depends on how you're able to interact with it, but you are generally able to query it to understand what the definitions are coming from, as well as the fact that if we're talking DBT specifically, the documentation that gets created just from DBT existing, is also usually very beneficial when it comes to that. So that image excuse me. That image that I sent, is is one that would be very helpful. I believe right now within, like, d b t d b t two Tableau, and someone can correct me if I'm wrong, that you should be able to see within the notes of of it. Like, if you do the describe, you should be able to see the description that was created for that metric. And Monty's clarifying in the chat here. For example, we saw that there was a bar chart with all the orders in your slides. Could we actually drill through to see the data that's informing those? Yeah. You could bring order number in and bring in that information as well. This is not necessarily it's not saying it's not just doing that. It's just doing a lot of the filtering within there. So you could still add in more dimensions, particularly if that's also specified within the semantic layer of, like, what dimensions you can bring in that you're able to then do that. So, yeah, it's not masking it necessarily unless you want it to. So in our example, I didn't define what dimensions I was allowed to bring in. You can. You can say, I don't want anybody to be able to actually dive further into this. I just want it to be at this aggregate level. You can define that. I did it in that moment. So just because it's coming in from semantic layer doesn't mean it's masking unless somebody's intentionally doing it so. Great. Similar question from Duall in the q and a asking, is there a way to separate the data model and the definitions layer within the semantic layer? Duall feels the data model is more on the data teams, whereas definitions are a business function. Mixing the two might create confusion. Was Dewal was curious about your thoughts. A way to separate the data model and the definitions layer within the semantic layer. So, yeah, I think that if I'm understanding that question correctly, oftentimes the data model will be on the data team to create, and the definitions are more of the business function. So that kinda goes back to the question of, like, who's in charge? Who owns the semantic layer, the semantic model? You can change it in depending on the tooling that you're using within that. I'll just keep using DBT as an example just because between them and cube, they were the the people for a long time when it came to that. You can change who has access to the YAML file that creates the dbt semantic layer. So it may be that only a certain amount of users within a group are able to actually do the data the semantic layer, so then that will kind of, like, remove who's in charge of those two. So the biggest thing on that is separating the data model and the definitions layer is more talking about, like, who's in charge of the actual YAML, the actual semantic layer on there. So if I go back to go back to my little lineage that I had, sorry for the quick succession. There we go. If we're talking about this, the way we would look at in here and how we're separating this is everything that's in the blue, that would be the data team's work. Right? That would be them. They're the ones creating the stage and the DIMM and fact tables. Then the business users or whoever is in charge of the semantic layers, they're the ones that are then connecting to fact orders and creating the semantic layer of orders. They're connecting to DIMM customers and creating the semantic layer of customers. So it's separating kind of that way. Bee had a follow-up question just asking, if masking isn't inherent to the semantic layer, does this not fundamentally add to usability challenges and a user's understanding of the data context? Does it not? So if masking isn't, I'm just repeating again to make sure I'm understanding. Yep. So if masking isn't inherent to the semantic layer, does that not fundamentally add to usability challenges and a user understanding of the data context? If they are if it's not inherent oh, okay. I see what you're doing. I see what you're saying. So you're saying like, if the semantic layer allows people to still dive into the data, doesn't that add some complications and kind of remove the necessity for it? Yes, it can. I think my thought process on that is more like people will still add like you need to think on, as you're building the semantic layer, what do I want people to be able to do with the data? So it has to take a little bit more. The default isn't necessarily It just it requires a little bit more thought process going in onto talking about, like, how you want to lock things down and how you want to make it so that people are able to interact with your data. So masking, I think, yeah, I think it just requires a little bit more thought process as you're building the semantic layer, similar to how much thought process goes into figuring out how to build your gold mart on, like, how do I actually want people to interact with the data? So you do end up having to be a little bit more specific. Think of it as, like, you have to specify the columns they wanna bring in as opposed to a select all or a select star. So I don't think that quite answers your question, but I would love to get a little bit more into that. Yeah. I think there's been a couple questions that you've had. If you wanna reach out to me, we can talk on some of these things. Great. And I think this might tie into a question from Doug from a few minutes ago asking about those five different definitions of revenue you had at the beginning of your presentation, Rachel. You know, how would the semantic model really work with those five different definitions across different teams? Yes. So that would be a moment where you'd have to make sure either that everybody's aligned on it, it may just be that one of those moments where nobody realized that they were all defining it differently. And it may be something where it's like, well, we bring in, we do returns within that as well because we think that that's important to revenue. Right? It may be something where it's just getting everybody into the room together to hash it out might actually fix that. Or it may be that you realize that there actually needs to be five different metrics. Right? Revenue with returns, revenue with coupon value. Right? That sort of thing. But it's just making sure that you all are aligned on it is the biggest part of semantic layers. It's a lot of times and what I was just answering to the other question about, like, needing to be intentional and, like, about what you're doing, a lot of times the point of semantic layers is to start the bigger conversations on how people are actually consuming the data and getting everybody into the room to make sure that we agree on it. It's very similar to when people started talking about data governance. Right? It's a very similar thing. It's just like, it wasn't necessarily that everybody was doing anything wrong or intentional. It's just we didn't realize it because we were all in our silos. But like creating a semantic layer is starting that conversation and trying to break down those silos. I realize we are coming up on time, and I realize that I did not answer all of these questions, so I will be going through some of these Q and A and these questions, and I appreciate you all sending them in. This has been really great interactivity. Appreciate it very much. And just want to make a final just thank you all, and thank you, Kim, for being here and asking these questions. And if anybody has any questions, please feel free to reach out to me. And as you see on here, you'll receive a follow-up email with a video replay of this webinar in a couple of days. And thank you all. Thank you all so much.

In this webinar, speakers Rachel Kurtz and Kim Aagaard from InterWorks discussed semantic layers, covering what they are, why they matter, and how to implement them. They explored the core problem semantic layers solve: the data chaos that results when different teams define the same metrics differently, eroding organizational trust in data. The speakers explained how a semantic layer acts as a governed bridge between raw data and consumption tools, operating on a “define once, use everywhere” principle. They also touched on popular tools like DBT and Cube, emerging AI-driven trends, and fielded audience questions throughout.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!