Data 101: The Preeminence of the Semantic Layer

Transcript
Alrighty. We've got a lot more people that have come in since I last chimed in. I suspect we'll have a lot more than that given how many registrations we had. This seems like a fairly popular topic, but it's good enough for us to get going. We've got some intro slides and so I'm sure by the time we get through those, we'll get all of the interested parties inside the webinar. It's worth noting, you can see there on the bottom of the screen, if for whatever reason you miss some of this or you want to share it or review it, we will post a recording onto the Interworks website. It says within a week, but honestly, our marketing team's so quick on these turnarounds. It's probably a couple of days, so you could certainly check back then. And anyone that registered will send out a reminder via an email drop to let you know when the recording's live. So let's get going. We are talking about the semantic layer and why it is so important today. It's always been important, but maybe we could say today it's critical. So we're talking about a lot of things about the semantic layer, looking back, looking forward, problems, causes, all kinds of stuff. Lots of stuff in here. While I'm thinking about it, we do have a chat. We do have a Q and A. Chuck, any comments you have or any questions you might have? There's a lot that I've put into this deck, so I will certainly try to have some time at the end. And if we do have time at the end, I'm happy to circle around and answer any questions that you guys might have otherwise, as we reach out afterwards, certainly happy to follow-up offline as well. I guess at some point I should introduce myself. My name is Robert Curtis. I'm the managing director looking after Asia Pacific. I am based in Melbourne. I've lived in Australia for going on twenty one years, twenty one years plus, I think. So it is a pleasure to see a lot of familiar faces, there in the chat, and it is a pleasure to meet a lot of new friends as well. A little bit about Interworks. I have to put a little bit of a sales effort here. Some of my sales people like Carol, she's on the call, will get grumpy if I don't. So what do we do? We do data strategy solutions to support so we can help figure out what direction you wanna go, how to get there. We can help you build the foundational things, solve use cases. And then after everything is built, then we can help you support how you progress. So that might be supporting communities, building the skills and individual users, helping you with the applications themselves. A more detailed view of that exact same commentary is this. And if you think about this little dotted line, green dotted line is kind of the life cycle of your data. So again, strategy, and we can think about the foundation building of the data platform, the governance policies in terms of what you want to do cloud versus on prem security, governance, all those sorts of things. And then when we solve problems, that's us adding value through the solutions of use cases, AI, analytics, data science. And again, have a pretty broad tool set in terms of things we work with. We have tools that we prefer because we've done a lot of listening and looking and studying. So we'll have very strong opinions. And then as I mentioned before, when we get down to the standing road, is us helping communities, individuals, your applications sustain and grow. A little bit more about us, I promise this is the last slide where we talk about Interworks. We've been in the business that needs to be changed. Thirty years of experience. I keep telling myself everywhere, but I don't need to change that. And clearly for three years, have not. Because we've been in this business for so long, it's really focused on analytics and data. We have a massive amount of experience and pedigree, seventy five, maybe even seventy seven now, have been or are currently Interworks customers from the Fortune one hundred. So the biggest, most complex data sets you can imagine have come to Interworks for help and solutions. That is reflected in the blog that we have, interworks dot com. And we get somewhere closer, I think, to four million page views every year just talking about analytics, data, governance, semantic layers, all of the nerdy stuff, hopefully, that you guys are here to geek out on today. And in terms of our customer footprint, we have more than four thousand. I think it might be closer to five thousand at this point, across every industry vertical, about every size of customer from the SMBs all the way to global internationals. And while we have collected dozens of partner of the year trophies across every sort of domain, technical domain you could imagine, one of the ones we're most proud about is when Forbes named us a small giant naming twenty five companies. And we made that list small, I think for three hundred people globally, but we punch way above our weight because of our effort, our focus on excellence really, and being not the biggest, but hopefully the best. So let's get into this. Again, like I said, plenty of stuff to talk about. So we're talking about the semantic layer today and why it is so important. Here are some quotes that I pulled out. Some of these are from Gartner, some of these from tools, some of these from other folks. I won't read them, but there was definitely a shift in say the last twenty four months where the semantic layer has been an aspiration historically, it is now existential. As you can see one of the words on the screen, they're just business critical. And when people have talked about gold being a premier organizational asset, as much as any critical asset that you own, that saying now that historically has talked about your data being in the sources, it now obviously includes without question the semantic layer itself. Now, of you are probably data people. Some of you might be data and analytics people. Some of you might be business people. So let's spend a little bit of time unpacking what a semantic layer is. I am guilty of using some fairly synonymous terms that have slightly different definitions when I'm talking about the semantic list. It's probably good that we define and kind of spend a little bit of time demystifying exactly what that is. So let's look backwards. The concept of a semantic layer goes all the way back to nineteen ninety one when business objects, this is pre SAP business objects, invented the idea. And again, the whole point was for nontechnical people to be able to look at a business term like revenue or profit, and to be able to leverage and access it without having to go and write the code, the SQL to then achieve that. It was very quickly adopted by lots of different organizations at the time. Business objects released it as business objects universe. Now business objects is something that was very prevalent, which has certainly gone the way of the dodo as the saying goes. But in two thousand and eight, it was bought for ten billion equivalent, somewhere in that neighborhood. AU actually went back and looked at the conversion rate at that time. Not a good time for Australian currency, something like one point five, which I guess is similar to what it is today. But that was really what kick started this whole idea of a semantic layer. And there's, again, a lot of terms that people use, and I'm guilty of this too when they're talking about a semantic layer. So I'm going to strictly say semantic layer until we get to those definitions. But I'm going to show you a little bit of the evolution of the semantic layer over five critical stages starting from nineteen ninety one. So let's start with sort of this monolithic era, which is business objects that again brought into Cognos, MicroStrategy, sort of all these legacy tools that are quite frankly, I still see migration projects happening. While that was a huge step forward, it was very proprietary. You can imagine that it would be in the business interest of business objects to make this proprietary because they came with this great idea. And they want to make it really hard for you to pull that business logic out and use other places that way you use their tool. Very valuable, but again, vendor lock in was sort of the way of things. We then moved to the late 1990s and the 2010s. This is where cubes came in. Performance became a premium but very difficult to manage. So cubes were probably it was inevitable that there was going to be a transition away from them. At the time, I remember being in university thinking that cubes were this massive game changing thing and they were. But of course, as the needs progress and other things come to the fore, innovation naturally happens. We then get into semantics as code. So this this came from Looker. This is probably pre Google Looker, and the idea of LookML. So moving the semantic layer into version controlled code so that you could get it. That's probably the best way to say that. The challenge there is, of course, LookML and Looker, even to this day, still has that sort of proprietary lock in. So if you want to use your semantic layer that's in Looker for AI agents outside of Looker, it's quite difficult. Obviously, Looker's got agents inside of it. But again, as we'll see, the semantic layer is not there to power an application. It's there for a lot of reasons. And I'll show you very in a lot of detail exactly what I'm talking about. We then go into this I don't know if I like this term, but I'm gonna use it, the headless universal era. And this is where tools like DBT really are game changing. And a lot of the terms that people use, when they're referring to a semantic layer come right from the the DBT sort of substack. Things like the metric layer or analytics engineers, some of the things we'll be using today are from this DBT usage and explosion in terms of popularity. And the reason it's called headless is because it's sitting right alongside your cloud warehouses, Snowflake, Databricks, whatever. And it then becomes an interface point, a consumption layer for all of these different sorts of tools, which is where it becomes the most powerful. And of course, that brings us to the last twenty four months where the semantic layer becomes less of an ambition and a nice to have because AI. AI desperately needs AI when we're talking specifically LLMs desperately needs highly curated business logic. We'll talk about what that means a little bit later in this presentation to to perform in a way that is gonna be highly successful and less likely that the hallucinations or other sorts of AI errors sort of pop up. So you need that really curated layer of business logic, which is what the semantic layer offers. Now I wanna take a step into what the semantic layer is. So let's hop over here. So we're gonna this isn't a a I wouldn't call this your your PhD level on data architecture, but there are some really closely rated topics. And I'm guilty, as I said, of using these somewhat interchangeably. So I thought for my own edification, so I could be very precise in how I present this as well as for yours, let's take some of these terms that are very interrelated or potentially have intersecting Venn diagrams and really define exactly what they are. So obviously, when we're talking about how data is useful to an organization, we're really thinking about how do we embed our business logic into it. And the business logic is all of the things that the human beings have decided our business does. So if we have if we're running a bank and somebody's a late payment, what do we do? That's business logic. How do we differentiate this revenue that maybe finance is thinking about? And how do we term and coin that versus the revenue that we might be using in sales to determine compensation or variable bonuses? All of those things probably have a little bit of finessing that human beings really have to make the decision and AI can't really decide. I'm not saying that that's the core requirement of business logic, but business logic are those rules. Now, we think about how that gets translated into data, obviously, we're talking about code. And for the longest time, this was the central interface. We would have coders, we'd have software developers coding. But we don't need to do that. And certainly, it's not the most efficient way for us to engage. So that's where we come with this idea of abstraction layers. Abstraction layers, can be throughout sort of the different layers of how think about the data life cycle or an OSI model. Abstraction layers can occur at any point. And basically what it is, in its most simplest terms, is we use an abstraction to simplify a lot of complex stuff underneath it. So when we're thinking about the semantic layer, we are abstracting a lot of raw code, SQL, Python, whatever, and showing you instead business usable terms. What's important to note is the semantic layer is an abstraction layer, but it's one of potential mini abstraction layers. That's probably the one that I was using the most interchangeably, is a no no for me. And so the semantic layer is the interface between users that are trying to find business value and insights and all of the preparation on raw data to make it usable. Data ontology. This one's popped up more and more, but this is something that really you can you can use an application to sort of map this, but it's something that sort of sits outside. It's more of a logical construction in terms how all of these things sort of interact. Think of it, I think this is the best way to say it, is a conceptual blueprint for your business logic and mapping those things. Very useful when you start to think about your semantic layer, which of course, if you do spend the time to build data ontology, it will enrich it and help your engineers to sort of put all those things together. But think about this as more of a theoretical concept that then can influence the technical execution. We then get to the metrics layer. Again, this is a DBT concept and it sits inside of the semantic layer, but this is where you think about your mathematical formulas, your aggregations of sum of revenue, all those sorts of things. Let's standardize those. And then that consider the metrics layer. So we don't have to do the complex math again and again and again. It is a semantic layer, but it's a subset of the semantic layer, if that makes sense. Again, we're not covering everything in terms of data architecture, just some key concepts I think that apply when we're thinking about semantics. The other part here that's super important is RBAC. And RBAC or role based access control is the security based off of roles which get, you know, users get assigned to and groups all sorts of how you wanna architect the way people can engage with data. And obviously, you want them to engage through the semantic layer and not into raw data. Your engineers and your architects do that. But this is the the lock and key. That's why we obviously use the lock icon whenever we're talking about these sorts of access controls. Because there is going to be data inside of your semantic layer that is PII that you need to have in there, but that doesn't mean everybody needs to have access to it. So if you've got a well defined RBAC or well defined permissions for different groups of users, then you can make sure they're getting the information they need, none of the stuff they don't. And when you combine that with a well architected governance, then you can archive, delete, retain, in terms of all the stuff that's happening across the entire data estate. But the RBAC is super important because it's the keyhole to which people are looking at through your data and you wanna give them the right size and access to do the things they need to do and likely nothing else. Hopefully that's useful. There's another way of thinking about the semantic layer. And this is this open semantic interchange. This is pretty new late twenty twenty five, early twenty twenty six. So this is a combination of different vendors that got together like Snowflake and DBT to try to put some structural logical frameworks around these things and mapping it towards a very similar sort of OSI equivalent. So we don't need to start really too far, you know, layer one, two, three, four, whatever. But you can see the semantic layer there is pretty high up. It is the the layer just before the experience or application where BI tools and other things sort of come back to the the semantic layer to engage with it. It is the translation layer. And again, already talked about metrics, but this is where the business logic gets translated into usable artifacts, products, data products for your users. I think there's another little piece of information that we probably need to explore. And this is coming out of, Databricks, medallion architecture. But of course, ideas get shared everywhere, so it's pretty pervasive nowadays. And this is idea of medallion architecture. I think there's flavors of this, but we'll present a pretty high level version of the standardized version. There's three levels medallion architecture. First is Braun. So it's we've got all this raw data in different source systems and we're just going to stage it in. Raw data, unstructured, no validation. We have an error checked, we haven't cleaned it or anything. This is fully abstracted from the business. They don't need to see this at all. We then go to silver and this is where we fix up a lot of the errors. So we're conforming it, we're cleaning it, we're taking out all the duplications. Data lineage could probably go pretty far back, but this is probably the earliest place that data lineage in catalog could go back and still be pretty useful. I mean, if you're not an engineer, for instance. We then go to gold. And this is where your facts and dims live. You're you're you would optimize your queries for performance because, again, these are probably gonna be powering some pretty useful data sets or tables. The thing that's important to note here is that while gold is sort of the highest level of medallion architecture from an engineering standpoint, it doesn't mean that this is the end of the journey. The semantic layer sits on top of that. And in a lot of the medallion architecture, people call this platinum or have other names sort of designate that, hey. This is a very user centric view of our data. Very easy for people in Tableau, Power BI, Sigma, whatever, to click and drop and drag things in. And they don't have to do much calculations. They don't have to do last mile preparation. Mean, they might have to do some, but the the goal here is to try to get this as usable as possible. But the other thing that's super important, we'll talk about that really in the very next slide, is this is what AI should be pointing at. So I love this idea of thinking about the semantic layer as your load bearing structure. And we'll talk about why this is so transformative just in the last two years. And the things that the semantic layer are supporting is obviously business intelligence or reporting analytics, all that kind of stuff. But more critically, more recently, AI has gone from theory and science fiction and high concept to reality. You need to be able you'd rather move it another way. You one hundred percent can start solving use cases today with AI. That might be conversational analytics. There's all sorts of things you can do with it. And I'm sure people on this call have been testing and playing this month by month. These models are getting better and stronger. And there's different flavors to them for sure. And doing some of the work that I do as well as generating ideas and kind of conceptualizing what I want to do with this presentation, worked across multiple different interfaces to sort of come up with ideas and make sure I was pressure testing some of these ideas. Other things that the semantic layer supports, machine learning and data science, data apps, data products that you might package and sell on a marketplace, any sort of export that you're doing, whether it's for regulators or other parts of your business or potential partners, vendors, whatever you got, all of these things rely on having well curated. You do not wanna send raw data outside of your organization because that means it's probably likely less governed and you are then putting the burden on the people receiving it to do something useful with it. So let's kind of unwind, back up a second, and see exactly where the problem has come with the semantic layer. If it's so universally useful, particularly since AI, why isn't everybody doing it? So this is kind of a rough sort of workflow of data. So you've got sources of data, different flavors. That's why I colored the rocks differently coming from different places. It could be this database. It could be from the CRM. It could be from this API or data dump or whatever, but they're they're they're coming in whatever state they're coming in. Some of them are very raw. Some of them might be pretty modeled. But again, they're not similarly modeled, there's gonna have to be a lot of work to so you then bring that into whatever platform you've got. Snowflake, for instance, Databricks. You then might do your medallion architectures. You might put bronze, silver, gold, or whatever process you use. You then layer in all of the business logic at that semantic layer. So we've got a crown there emphasizing how important it is, and then you would pull it downstream into reports like BI. That's the goal. But the reality is, is that if we looked at how most organizations are set up as a rule of thumb for roughly every data engineer that you've got that's working in a centralized capacity, we're the data team. You might have five people or more as a ratio out in the business trying to do stuff with it. These might be centralized analysts, they might be business analysts. But the ratio is historically very strongly skewed towards the number of people building reports versus the people building data. And that's partly because of the BI two point zero paradigm where every insight needed a dashboard. So we needed a lot of dashboards and oh my gosh, we have a lot of dashboards. We need a lot of people to curate and support those dashboards. So you can already probably guess where this is going. So as a data analyst, if I'm trying to create stuff and I don't have the data I need, I'm gonna just start bringing in my own data. So whatever data I actually am getting from the engineers, I'm enriching it with other stuff. That might be me doing marketing stuff and pulling in stuff from Meta or Google or whatever. I might be bringing in my own spreadsheets. Just noticing some of these questions, but I'll get back to those, I promise. But I'm going to be building integrations and relationships at the workbook level to facilitate this bespoke needs I have because as a ratio, we're just not getting the throughput of all the data products I need. And so as a result, the semantic layer doesn't actually exist as a centralized unified repository business logic. In most instances, and I'm sure a lot of you folks can probably understand this, it actually sits in your workbooks. And that is potentially a massive disaster. Because if you're using Tableau, for instance, that might be a whole bunch of Tableau functions. Even worse, it might be dozens or hundreds of lines of custom SQL, and that is isolated and invisible to the rest of the business. Invisible in that we can't reuse it, invisible in that I don't know how you're calculating it, and invisible in terms of we might all end up with different numbers. There's lots of reasons why that's a bad idea. But what it was inevitable sort of in the old paradigm. Some other things that make this even trickier is a lot of places don't just have a single warehouse. They actually might have multiple, maybe intentionally, maybe accidentally. So where does the actual data come from? And a lot of organizations, because of the deficiency in data engineering, or perhaps they're using legacy tools that are very cumbersome, can only manage to get everything or most of the things to just sort of a staged raw level. And they're like, well, the analysts are smart and they've been doing their own data preparation in the workbooks for years. They can just continue to do that. We'll just keep staging raw data. And this becomes a bit of a cycle of disaster. So we're gonna use a little Picard facepalm. But it can actually now get worse. And this is why the semantic layer is elevated to being critical. As soon as you throw in AI, what is AI going to consume? Well, naturally, it's going to consume whatever you've got in your semantic layer. And if all you've got in there is bronze level data, raw data, or incomplete data, it's silver or gold, or that no one has agreed to actually standardize and govern the data that's at that gold layer, then you run into real problems when you try to implement enterprise AI at scale. And this I think is where the light bulb moment has come on for executives. The semantic layer is not just a nice to have. It is not an aspiration. It is existential now. So if we go back to this diagram, I think this is if if I've got ten things that will sort of disseminate on the very next slide. But this is probably the number one thing I need you to think about. The semantic layer as a curated asset supports all of these things. And at least the first two, probably more for your organization, are critical to your day to day operations in your future. So let's break this down into ten specific things. Number one, let's just talk at a high level. If we centralize our business logic and as a single source of truth, then that means we do the hard work of defining it once to get everybody to agree on it, and then we can use it again and again and again. So taking it out of isolated Tableau reports or Power Query or wherever your business logic might be, let's bring it into a single unified layer. No more duplication and a lot more agility so that if you do wanna change how we calculate it, what is a customer, what is a sale, what is this product? We can do it at a centralized place and it flows down versus having to go across a thousand dashboards and change it. Obviously, data silos are a critical problem. People deal with, they've been dealing with them for ten years, but the semantic layer is the solution. You don't need to rebuild the same logic across multiple tools. And that's really true for a lot of folks that have multiple warehouses. Make a choice, standardize. Foundations for governance. Again, just like the semantic layer, and I think we've talked about this in previous webinars, Data governance has gone from, we have to spend this budget because we've got this compliance or we've got this regulatory or some consultancy said, we've got a gap in the way we sort of classify or monitor, or this really important part of our business saying nobody's telling us anything and sales ended up with a different number than us. So let's just go buy a governance tool and we'll put it on the shelf. Like most governance tools prior to the age of AI, it sat on the shelf, half of the data maybe got in there, half of that got classified, and one tenth of the users actually used it. That's not the case anymore. Governance is now just like the semantic layer, mission critical. Because AI is going to look at that stuff. And you need to be able to curate, govern, and protect assets that you don't want looked at and identify ones that are not functioning well. And then all of the ways that you can then empower users, not just users in the forms of agents, but human beings to find this stuff and make it easier for them to go out and make decisions, to get insights, to innovate. The other thing that's really important about governance, I ran across this multiple times with organizations is I asked them, I say, hey, okay, how big roughly is your data saying like, oh, we got fifty systems and it's probably fifty terabytes, we guess. And I'm like, are you a hundred percent comfortable that there's no PII information sitting out there? Like, no, we're not. Like, well, if you're not sure, that means somebody else could find it. And so that could be regulatory problems, compliance problems, brand problems, could be all sorts of things. Governance is essential to helping you guys classify data, protect it. Obviously, our back is part of governance. We're also talking about data quality. If your users get bad data and make bad decisions that attacks the very foundations of building a data culture. Governance is part of that solution. We've already talked a couple of times about AI, but again, it's super important to reiterate. You cannot do enterprise production ready AI without the curation layer of your semantic layer. Garbage in garbage out. Context is king. For anyone that's been playing around with AI and obviously I do it for work, but I've also been goofing around with it on a bunch of little personal projects. AI to me is, it's almost like a combination of my oldest teenage son and a legal brief. I have to constantly make sure that it's doing exactly what I asked and not adding more complexity or adding new, hey, how about this? I added this. I don't want that. But if I don't tell it, hey, some of the ideas that you put into the last output are not what I want. It just assumes that I'm happy with them and carries it forward. So just like, I don't know if you guys have done a lot of contracts, but if you don't say I disagree with this, I disagree with this, anything you don't specifically point out gets brought forward as if you agree with it. So AI requires a lot of context. Productivity gains. Obviously, if we can make building insights, generating actionable intelligence easier at the data analyst level, then that's going to facilitate benefits for everybody. So if I don't let's say for instance, for me to get my data to where I need to, because I'm working off of pretty bronze level stuff. I have to write hundreds of lines of custom SQL, or I have to build a bit of a dashboard until I get it close. And then I download into Excel or Google sheets. And then I do the rest of the transformation because I can actually just change numbers to do what I know they are, but the data doesn't reflect it. If I don't have to do that and I can just literally start to click and drag and I don't have to understand all the super complex underlying data structures and integration and transformation and error checking that should be done at the engineering level, then that means all I have to do is click, drag, and explore. The other thing, and I talk to a lot of customers about this, is if you centralize your data outside of your BI tool, and a lot of your BI tools want to to own that data, like the data cloud in Salesforce or Looker with LookML or other things, if you can actually centralize it in a tool built for it that might be inside of Snowflake or Databricks or it might be something more specialized like DBT. If you do want to change your analytics tool, then it becomes much easier because then it's just literally changing visualizations versus actually having to harvest the business logic trapped in those workbooks. You will save yourself a lot of time if you do that. If for instance, you're a large organization and you have need, desire, and want for multiple tools. So our our finance people use Sigma because they like the the the Excel style interface. But, our manufacturing teams in Asia, they use Power BI. They're very cost sensitive and they run Microsoft over there. But we've got some operational reports for the executives that's in Tableau. So let's say you've got sort of a hybrid environment. Again, a centralized ecosystem really, really helps because everything can consume from that. Same thing is true for data science. So I do wanna talk about this because this is something I've been seeing more and more and more over the last twelve months is this idea of a new paradigm in terms of how people look at their semantic layer and build their semantic layer given the analytics paradigm is changing. I think the very last webinar we did on this was called BI three point zero. And so if you want, go to the InnerWorks website and look at our recordings. Can't remember that exact name for that section, but you'll find it under events or catalog library, whatever it's called. And we talk a lot about this, but I'll just summarize. BI two point o, which I referenced already, was basically you can ask more advanced questions using tools like Tableau and Power BI. But the answers take the form of a dashboard. And so a lot of the folks on this call probably have hundreds, thousands, maybe tens of thousands of dashboards, most of which are sitting there created for particular purpose and have not been revisited, but they all equate into this sprawl. This organizational debt of now that we've got ten thousand dashboards, I don't know where to look. I don't know what numbers to trust. BI three point zero means that we don't actually need a dashboard for every form of an answer. Instead, we can potentially have fifty really tight dashboards that are very well thought out, that have a lot of key metrics and and storytelling built into them. And all of the other ad hoc requests can be built into an AI bot or an agent that is there sitting on top of a really well curated dataset to take questions and give simple answers. The benefit of that is that most of the time when I give somebody a dashboard, say here, here's your answer. They're like, well, I have two, three, four, five more questions now that I've actually seen this. The AI can facilitate that interchange using natural language. And so one of the things that we've been really working on with a lot of our customers is how to facilitate that migration to that capability. And as weeks and months go by, the models and tools inside of these applications, Genie, Gemini, Cortex, are getting that much stronger. I mean, it's literally at that pace. So it's very, very exciting. But as a result, the data analyst, I. E. The dashboard developer, the role can migrate. We don't need somebody that has the ability to write custom SQL and all these functions and formulas. In fact, we don't want them to do that. We want them, We want all of that stuff in the semantic layer. And the semantic layer is traditionally the bottleneck for everything that we want to accomplish with our data. So instead of hiring more data engineers, the trend right now is to take our data analysts and make them less focused on visualization. Because again, a lot of that can be satisfied with conversational AI or a talk to data agent, same thing, is let's give them basic skills in data transformation. So that might be SQL, Python. It might be playing around in DBT. But because those former data analysts that we're retraining as analytics engineers sit in the business or in the domain, they are much closer to the business logic that they actually have to author. Every person that has done data transformation knows that when finance comes calling, oh my gosh, those guys, numbers are so complex. The logic that I I have to know finance for me to help them. Well, if I could give them the power to easily sort of write their own business logic into the semantic layer, but applying our standards, our best practices, our governance protocols, then that then unlocks the ability for the semantic layer to grow quite quickly because every analyst then becomes an engineer. Maybe not every analyst, but a lot of them should. And so long as I then put a well defined RBAC model on top of that. So yes, there is enterprise level worthy finance data or sales data or marketing data or product data or whatever. The finance people have full access and can do whatever they want with it, but everybody else within the context of their role can then leverage it so everybody benefits. So it can be quite data mesh, which is business units having sort of a contract or a SLA back to the organization to provide their domain sort of business logic in the form of the semantic layer. It doesn't have to be fully mesh. There are gonna be different parts of your business that are more data progressive than others. And in those parts of the business, you absolutely should be exploring how do we build analytics engineers into this part so that they can do some analytics, they can build data apps that have AI elements to them, but they most importantly can help the rest of us by building into that semantic layer for the areas of their business that they shepherd and steward. So let's take a look at this actually means in the scope of a slide we just looked at. So the data engineer was looking after the transformation semantic layer and the analyst was then doing all of the BI stuff, or potentially that your data scientists might be doing the ML stuff, whatever. But if I take that analyst that I make them into an analytics engineer, that then extends their ability to contribute into the semantic layer. And now instead of a one to five ratio, maybe now I've got four or five people in that ratio that can add to my semantic layer. And we have to facilitate AI and the opportunity that it presents. We have to accelerate our ability to add more curated data. Data is not going to continue. Data is not going to stop growing. Your data estate has not reached capacity. It is going to double if industry streams remain the way they are globally. They're gonna double about every two years, if not sooner. So you have to increase the rate in which you can actually make this data usable. AI is coming. There's a lot of tools that are adding AI to help you build this the semantic layer, but human beings absolutely are going to have to be a part of that for the long term. Human beings have to decide what is revenue, what is a transaction. So we need people in the engineering space. And if we can get them from the analytics space to contribute, wonderful. So here's the problem, is the semantic layer is so valuable. People are probably saying, well, if it's so valuable, why haven't people done it? And the challenge is a couple of different things. One, there's a lot of data. I could have pulled stats. I've got them in other decks, but it's like, it's an inconceivable amount of what a standard medium sized organization has, particularly if they've got any sort of online footprint of just how much data they have and how quickly it's growing. But the real challenge here is, is that there are some initiatives related to technology and data that are from the bottom up. And that's oftentimes finding a new analytics tool that is a land and expand. I found a tool. It works great. Other people start using the tool, and it becomes an enterprise tool. That could be Tableau. That could be Sigma. That could be whatever. Other initiatives have to be top down, like data strategy. The users cannot tell the engineers and architects what the data strategy is. The executives have to. And governance is another one of those. Governance has to be mandated from the top down. It cannot come from the bottom up. And so the challenge, one of the biggest challenges with the semantic layer is getting the business, the organization to come together and agree on what basic terms are, and then do the hair splitting to figure out what does this thing mean here in this context, and what does it mean if they don't, if it's on this side. And if they don't do that, the semantic layer then becomes less useful. And the more that's not done, this shared responsibility model, which is an agreement between the engineering, the technical people who build this stuff and the business people tell them what to build, the less useful it is. And it can get to a point where it becomes useless. Decentralized governance or delegated governance means that you've got highly centralized controls. You've got your RBAC that's all centralized, but the stewards have to be in the domains of the business. So they're saying these metrics mean this to us. This is how we calculate these things. And so the business to the centralized, so hub and spoke, it's a two way arrow of everybody working together to cultivate trust. That's mission critical. So this is the big thing. People have to come together before the semantic layer can be built. And oftentimes, that sort of marriage counseling is what takes the most time. At least it takes the most if you think about these as like vehicles that are starting from zero, building your dataset is like a semi truck loaded with coal or iron that's trying to get going from the gas station. Takes a while. Getting people to talk about the semantic layer is like a train, a locomotive with five thousand cars behind it. It takes a long time for people to actually start moving on it because you got a lot of non data people having to be convinced why it's important for them to actually just stop doing things in their own silos. The urgent instead talking about with everybody on how we can everybody can benefit, I. The important. Takes a bit of time. The cost optimization. So there is a financial benefit from centralizing all of this. If you're using a cloud based data warehouse or an ETL tool, it is consumption based. That is true of Snowflake, it's true of Databricks, etcetera, BigQuery. It's it's true of most of the, extraction tools, a lot of the transformation tools. There is a usage cost. So if you centralize it, you could probably get to where there's a critical dataset or a critical table that a lot of people use rather than having everyone run their own query test support their own individual analytics. The semantic layer can centralize that. So it's only run once per day. And also, you'll have your best, most technical people applying best practices to where the query was written. And it can be optimized for performance. All of these things aggregate together so that you can oversee and have predictable expenditure of credits. A lot of people, when you have the wild, wild west and analytics, you end up with ten thousand dashboards. If you have the wild, wild west in your virtual warehouses, you have a consumption bill that's three times more than you thought it would be. So it is a direct financial benefit for you to centralize and not let people just go crazy. In terms of next steps whoops, too far. I think there's probably three things that you can do. Obviously, the number one thing you could do is give Interworks a call and we'll walk you through all of this. I have to say that. But if you wanted to start on your own, I think doing a bit of an inventory of your data would be a an essential thing to do. Where does our business logic lie? And you'll be surprised how much is probably in your BI tools, either in extracts that are sitting on your your analytics server, whether that's on prem or online, or even worse, if it's sitting inside of individual workbooks. As you do that data source inventory, mark criticality on it. And you can determine criticality in a number of different ways. It could be number of users, it could be the risk to the business if we if we don't have it. So that could be a compliance perspective. It could be, you know, we we have a very customer centric model, so we need to make sure product or logistics supply chain, whatever is is well looked after. Then you can start to look at the semantic layer options that you've got to you and make the best decision for you. As these data platforms get bigger and better, Snowflake and Databricks both, our preference is obviously we love Snowflake, you'll see it all over our website. They are able to do more and more things natively in the tool, to the point where Snowflake can go look at any publicly available data source, I. API, etcetera. Pull that into Snowflake and then natively in Snowflake, you can start to do transformations. There is value. I'm gonna have to take a drink. I've been talking too much. Thank you. There is value in looking at other tools to then sit aside sit alongside something like Snowflake. DBT has a lot of unique values. Confluence, for instance, has a visual interface. So if you've got the idea of analytics engineers added to your semantic layer, visual workflow type tools can be a way that they can contribute without having to go learn a lot of Python. So there's a lot of different ways that you can bring this together to be a best solution in architecture for you, which is why a consultancy like us is so critical in helping you because you don't have to guess and you can tailor it to what you guys need versus what some white paper on the internet said that might not even have your particular business, your scale, your size, your complexity. The other thing that I think is start to build the other thing that you need to do in terms of moving forward with your data is start to bring the people element into this. So think about these as like councils so that you can have conversations about this dataset and come to agreements. And very quickly, if you get the right people into these workshops, you'll find the fault lines of disagreement. And that's the goal that you're going to be really mining. It's where people don't agree, which is going to be the biggest step forward building a semantic layer that's really impactful and valuable. And I would say these things are sort of the critical sort of first steps. And if you haven't done these things, any interest you have in trying to do AI should only be considered an experiment at best. We can help. There's a lot of ways we can help, but I'm gonna give you two specific things. The first one is if you are interested in digging deeper into options when it comes to the semantic layer, that might be the data cloud, the lake, the house, the warehouse, whatever, we can help. If you want to go a layer deeper, it's like we wanna look at ELT or maybe we wanna just as the E or just the T. ELT is extract load transform. That's all the stuff that you would need for your medallion architecture to transform and get your data ready. We can help there too. If you are interested in your analytics tools and your analytics tool saying, hey. We would really like to have your semantic layer embedded. We could talk to you about the ramifications. And if it's a tool that you're really, really wedded to, well, we can talk about how you might be able to replicate that logic outside of it so that other things can use it too. There's a lot of ways to solve these problems. There's easier ways, but if you have a particular way you wanna go, we can help you solve it. Lastly, we have a custom assessment, a tactical review that we call a DART assessment. DART stands for data and analytics review and tactics. And so basically, what we do is, we have some surveys we do with different groups of your business to cover five technical domains we think are critical to using data effectively and also that which then leads to your readiness for productionalized AI. Those five domains are culture, analytics, data, governance, and infrastructure. We then score you, give you back a visualization on how you're doing, and then give you back some tactics and and a bit of an report and assessment on how well you're doing. The good thing about the Dart is that it's pretty painless. It takes about a a week, five or six different sessions to sort of have these conversations. And then in a year, two years or whatever, when you've made some progress, a a doing a second DART is actually quite useful because then you can visually see your maturity and how your capabilities have grown or potentially where they haven't and then you can refocus those efforts. We're offering a free DART to anybody that's attended our webinars. I would caveat it as most useful as part of another piece of work, but we can talk however you want. As I mentioned, here's more detail on it. Those five maturity frameworks, culture analytics, data governance platforms, or infrastructure, same thing. So we certainly would love to chat with you on that. If you would like to contact us, you can just scan that. Otherwise, you can go to the website. You'll get emails from us with the recordings. You can just simply reply back to that. Several of you on the chat, we know quite well. You can certainly just reach back out to us. Okay. I was good. That only took me about fifty minutes. So now I can go through some of these questions and it looks like there's a whole bunch. Some of you may have put them in both the Q and A and the webinar chat. So let's just work our way through it. And Ritha wrote, can you please explain the difference between abstraction layer and semantic layer? A semantic layer is an abstraction layer. It is a form or a type of abstraction. So if you think about, abstraction layers as sort of the umbrella definition here. Might not do a perfect job explaining this, but let's pretend that we're thinking about the the cloud and you've got all of the really technical stuff happening on AWS, the interface that you log into to pick this virtual machine to run this thing, that is an abstraction layer to all the complexity underneath it. The semantic layer is the abstraction layer for how business users interact with your data. So an abstraction layer a semantic layer is an abstraction layer. It's a particular type focused on your business logic and your data. Hopefully, that answered that question. Oops. Yeah. Okay. Cool. Next question. Also from Amrita. Lots of questions. Medallion architecture. Where should complex business calculations, aggregations, etcetera, sit? Gold layer or semantic layer? Assume data has daily level granularity. That's a good question. I think it would probably require some conversation with a data architect to really figure out what's best use for you. Obviously, you want it at the semantic layer, but that doesn't mean it couldn't start at the gold. You certainly wouldn't want it you wouldn't want that type of stuff at the silver layer. Amriti, if that's something that you're really interested in, reach out and we can get one of our solutions architect to walk through that with you in a bit more detail given that we have the ability to get more of your requirements than I can now. This one is from, I'm gonna attempt this name, I apologize, Musumi. What is the advice or ground rule when there are many versions of slightly same data and semantic layers? This is where the confusion starts as departments are often resource poor or they can't come to one agreement and create their own version. But I was going to say, they can't come to one agreement or create their own version. That's kind of the answer. You either need to consolidate or we need to differentiate. And if they can't do either, then we have to lock them in a room until they can. If they insist they're gonna call this revenue and this group over here is gonna call it revenue, they're like, listen, that doesn't work. If that has to be revenue and you guys are sales revenue or compensation revenue, or we have to differentiate it so that the end users and AI most importantly doesn't get confused. So that's the hard work, the people work, the diplomacy, the politics that has to be done to actually get the organization to act in its own interests. And if there are slightly different versions of the semantic layer, I'd say you're probably not semantic thing. I'm gonna try to make a joke out of this. Semantically, semantic thing correctly. The whole idea of the semantic layer is standardization. Where we can differentiate is the names of these represent different definitions, but we can't use the same name. Hopefully that helps. Wow, Amrita, you've been busy. Regarding analytics engineer, I hope you are suggesting self-service analytics. If reporting tool uses Power BI, do you recommend Santa Clara in Power BI itself? No, I don't. Every stack has a solution for the semantic layer lives. So I think with fabric, that's where they would prefer it. Or what is it called? One lake or whatever. I'm not fully across Power BI confession. Google wants it to sit in Looker. I think that makes it tough because, again, Looker is an AI. It's a BI with AI agents in it, but that that kinda makes your semantic layer invisible to the other stuff around Looker, which is why tools like DBT or the ability to build the semantic layer in your cloud warehouse are so vital. But the analytics engineer needs to have the domain knowledge to then write back business logic to wherever your universal centralized semantic layer sits, but not hidden in API tool. See here. I'm gonna jump over here to May. If you have one to one data engineers to data analysts, is there any merit in training the data analysts to become analytics engineers? Or does it become too many cooks in the kitchen regarding building the same semantic layer? I would be shocked if you're one to one data engineers to data analysts. If that's true, you are probably, you probably have a fairly a small or smallish team to where you might have three and three. Where you really start to see that ratio is when you start to get to a larger organization. My guess is that if you are at that scale or you've got that ratio, you probably have a significant population of your business users that are underserved in terms of their ability to use data to make business decisions. But in terms of the specific question, does it become too many cooks in the kitchen regarding building the semantic layer? This is one of the benefits of DBT. They do have versioning control and get so that you can actually work in collaboration with each other. I think the other idea is is that we want the analytics engineers working on their domain specific stuff with your centralized engineers overlooking to make sure they're doing it correctly, efficiently in terms of how we use and consume credits. They're using best practices so the model and architecture and the definitions in the way we actually program things so that they're as extensible and easy to understand for other users as possible. May, that might be a good one to reach out and give us a little bit more context with one of our solutions engineers can give you a bit better answer, but I'm sort of speculating to kind of give you. Speak some of these other ones that are kind of funny. Here's one from Andres or Andres. How do you trust what AI does with that semantic layer? AI hallucinates so much. For example, in light of stuff in the news about AI going rogue. I agree. It is a massive problem and it will continue to be a funny one as long as it happens to someone else. My suggestion is to start your use cases with the platinum or semantic layer. So for thinking about Medallion, I would call it platinum if you think semantic, but most people think the semantic layer is all of your Medallion architecture or gold. But I would start with the best version of data you have and start very narrow. If you are fairly nascent to this stuff, then I would say, let the people that are handling ad hoc data requests, sort of those the data analysts that are supporting nontechnical business users, give them the AI and let them use it to accelerate their role to answer questions on behalf of other people. But because they're specialized in data, they can verify what they're seeing. So the data isn't performing. The AI is giving back wrong results. They'll be the ones most likely to spot it and not pass on the answer. So in summary, start with really technical people that can make sure the AI is functioning the way we expect. Pick a very narrow use case that has less latitude for open ended questions that could lead to failure. And three, really start with your best data in terms of what you're gonna prioritize your use cases. Just looking at some of these other ones. I think that's probably all of it. Apologize if I didn't get to all the questions, but we are certainly at time. Thank you so much for jumping in. I hope you found this useful. We'll have the recording out. We will probably put some form of a series of blogs or white paper together around this topic. Nish, I'm sorry we're out of time. If you want, email us, if you've got a question and I'm happy to answer it. Otherwise, thank you so much for jumping on, and I look forward to seeing you guys next time. Thanks, everybody. Bye.

In this Webinar, Robert Curtis presents a comprehensive webinar on the semantic layer and its evolution from a nice-to-have feature to a business-critical component in modern data architecture. The presentation covers the historical development of semantic layers from Business Objects in 1991 through to today’s AI-driven requirements. Curtis explains how semantic layers serve as abstraction layers that translate complex technical data into business-usable terms, supporting everything from business intelligence to AI applications. He discusses the challenges organizations face with distributed business logic trapped in individual dashboards and workbooks, creating data silos and inconsistencies. The webinar emphasizes how AI has made semantic layers existential rather than aspirational, as AI requires highly curated business logic to function effectively. Curtis introduces the concept of analytics engineers — former data analysts retrained to contribute to the semantic layer — as a solution to scale semantic layer development. He covers medallion architecture (bronze, silver, gold layers), the shift from BI 2.0 to BI 3.0 paradigms and practical implementation strategies. The session concludes with recommendations for conducting data inventories, building cross-functional councils for business logic standardization, and Interworks’ DART assessment methodology for evaluating data maturity across culture, analytics, data, governance and infrastructure domains.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!