Mastering Data Governance EMEA

Transcript
Hi, everybody. Thanks for joining us. Happy to see everybody here. Yeah. Welcome to Mastering Data Governance. Today, we're gonna be talking about some green flags and some red flags, and thinking about what, governance is in the in the eyes of data and analytics generally, and how we can map out, our path to success, whatever that means to you, and your organization. My name is Max. I'm a services lead at InterWorks. I run our EMEA solutions team as as well from my home in Edinburgh in Scotland. I've been with InterWorks for about eight years now. Near enough. I think it's coming up in June potentially. Might be. And, yeah, within the solutions team, what we do is we take various components from our different practices, whether that's data engineering for more kind of core database and analytics pipelines work, and marry that up with some of our other practices like our analytics team who can build analytics and consider governance and the processes that we put in place, as well as developing sort of gold star content, for our clients, as well as things like, experience with graphic designers and even IT and platforms work. So, again, InterWorks, we kinda got a lot of the analytics or all of the analytics pipeline and ecosystem kinda covered, for anything you might need us for in the data space. So today, we're gonna be talking about something that actually is quite close to my heart and that it does cover all of those areas. We're talking about governance, and and what that means with those different lenses that you might have on the, the functions. So we'll get into that very, very shortly. Today, we are thinking, we're starting off by talking about what governance is, as I mentioned, as well as following that up with the hierarchy of needs. So spoiler alert, we're gonna be putting this in the framework of, Maslow's, ladder. We'll get into that very, very shortly. So thinking about how we get better in a particular discipline while remaining the core tenants of that discipline, the fundamentals stay in place, and we get better by climbing up that ladder. Then we're gonna look into a framework for success. We're trying to put that into a sort of real world context of an organization and what steps we can take in order to kind of climb that ladder and get more towards our goals in terms of, giving us the the appropriate level of data governance, for for for our needs. And then hopefully, we've got a lot of content to get through today, but hopefully, we'll leave some time, to, allow for some q and a at the end. As we go, please feel free, as Vicky mentioned, to drop some questions in the chat, or compliment my music choice of Dave Brubeck. Feel free to to do that as well. Thanks, David. And, also, we can hopefully be answering some of these questions as we go. So that will, mean if people do have to drop off the end, then we've got our bases covered. If we don't manage to answer all the questions as we go, then we will follow-up afterwards. And, please feel free to get in touch with me or anybody else at InterWorks if you want to follow-up and, talk about some more specific, requirements that you might have with data governance in your organization. InterWorks are a full data and analytics stack consultancy. We've been in business since around nineteen ninety six, and we're primarily based out of Stillwater, Oklahoma. So kind of US based, for the majority, I think, of our consultants. Although, we do have a global presence within the UK, obviously, like myself, as well as in Germany and Australia, given us that kind of global coverage which allows us to support our clients with, with with kind of twenty four seven support if that's required, and, and and facilitate any projects in any of those areas. We focus on strategy, solutions, and support. So we like to deliver our full strategic advice and development consulting across various solution types within data and analytics. Walsford team have been with us since the start of our journey in nineteen ninety six, and can map those current trends that we might be seeing at the moment with AI and governance being very much central to those to the historical context of analytics. So whether we're anticipating the next meaningful change in analytics for the industry, in a particular direction, or we're just trying to come better in a particular direction. We might be a well formed path, but we're just struggling to get to that next step. Our strategic consultants can advise you, and and help your organizations in taking that next step. And, we can kind of fit some of potentially our laundry list of analytics solutions and services to help you get to that next step on that on that ladder. We sent to these these trends, I suppose, tend to grow kind of organically from the needs of the industry. So things like moving to cloud or preparing for AI, building data warehouses, or even adopting governance tools, which may be a part of the conversation today. We understand that kind of business context and the prerequisites. And, again, we'll we'll talk about a lot of those prerequisites as we get through, today's session today. We work with lots of small and large firms across the globe, some of which you might recognize, from the screen here. We also partner with the the products that we think best represent the business value, and fit to their particular niche. There's lots of various different components to that entire data ecosystem and kinda life cycle, of our data. And whether that's ETL or E ELT or data visualization in terms of BI, and and dashboarding tools or even data warehousing, for instance. We like to think that we have the best sort of scope and lens on what tools are bringing what to the table. And therefore, if you have the requirement for a particular type of tool that has these particular constraints, we can recommend something from, from from the wild that we think would probably fit that, in in in the best way and and be the best value for you. So we're happy to advise on any sort of tool selection strategy, if you're looking to adopt a new technology. Even if we're outside the tools that we have on screen, this is just kind of a high level view on some of the tools that we partner up with. But, we're always looking to to increase this list if you see anything that that you think is particular particularly interesting. We'd love to to get our hands, dirty and roll up our sleeves with them. Our track record here hopefully kind of speaks for itself. And if you find your way to this, this this, workshop, then, hopefully, you already know us. We were Tableau's original services partner and one of the first with Snowflake. So there's so many accolades here on screen. I won't go through them all, but hopefully this gives us some credibility in the space, that we're gonna be discussing today as a strong governance strategy. Generally, it's something fairly central to all of the tools, that we see previously, on that last slide and even on this slide here today. So over the last two years, our team in, Asia Pacific, in APAC, and Australia have been, conducting informal surveys of data analysis professionals on, biggest challenges that they commonly face as organizations. Total sample size here is about nearly three hundred people, I think. Sort of leaders of those, those organizations, and they range from, attendees, at conferences to workshops and webinars and so on. And we talk to these, yeah, leaders across different geographies and and verticals and different degrees of experience in organizational science. Governance has kinda came out of this one as a as a clear front runner with thirty percent of all the people asked, what's the biggest challenge governance was was up there as, almost double the second one down, which is people resourcing. It's the the largest concern, I suppose, to summarize this for the three hundred leaders that were kind of asking this question. And one of the reasons for this, I think, anyway, is that it's such a broad term. Everyone, kind of thinks they understand what governance is, and they have a a wide range. Your governance, I suppose, has a wide range of impact, when we don't get it right, depending on where that that, that that issue has has arrived. So whether we're in IT, we might think of data governance as one thing. Data engineering might think of it as another. Analytics might kind of think of it as something slightly different, the nuanced difference to that. And security and trust, of the data, kinda comes to mind for everybody when when thinking about governance. So it leads me on to kind of a a general question, that we're hopefully gonna have a poll that will pop up for us here. What do you think governance means? If your manager asked you to define data governance, how confident would you be in your answer? A, very, b, somewhat, or c, you shudder and run away from the question? I'll give some people some time to think about what they think governance is. And we've got thirteen out of the twenty four of you have responded. So about nine people waiting to respond. We'll wait for that to be a hundred percent just in case. I was gonna say gentle nudge there, Max. Gentle nudge. Gentle nudge. Yeah. I think we we've got the answers that we're gonna get. So I'll, stop the poll, and I will share those results. Perfect. Excellent. Yeah. So, hopefully, we can see the results of that, and I think that is pretty indicative of what we would expect. So seventy one percent of you, have said you're somewhat confident in your ability to answer that question of what is data governance, or how can you define it rather. So I think this this absolutely talks to the main point. Right? So this is, what do we mean within our particular lens or within the entire data analytics lens of of what is governance? Well, we might think of policies and procedures on data guardrails that we put in place as an organization organization that's still very broad. There's subsections of governance. So things like usability, discoverability, lineage, cataloging, roles and permissions. Single source of truth is something that was, you know, more commonly heard a few years ago, but definitely still still a very large consideration. There's also things like compliance with frameworks. Right? So things like GDPR, ISO twenty seven thousand one. All of this kinda comes under the guise of this umbrella term, which is data governance. So we've tried to help here. We've tried to be kind, and we've put a bit of a definition, on on this slide of of what we think data governance is. And, unfortunately, it's still quite broad. Right? It's always gonna be, unfortunately. So we think of it as, the, within the modern data stack, we're encompassing how organizations manage, protect, and contextualize the data assets across the entire data life cycle technology ecosystem. So I guess the three verbs there maybe would be your management protection and contextualization, I'm gonna say, is is is a word. So these all three of these play a role here. And to help us kind of dig into those, maybe if we pop over onto this slide, in management protection, context, contextualization. We have concepts like data ownership, data stewardship, and data quality. These three that I've kind of randomly pulled out across the slides, so these three here. So these three are really about people taking responsibility and understanding, when something does go wrong or protecting against something going wrong, how we can we can solve those problems quickly, effectively, and stop them happening again in the future, and then building up that kind of layer of the unknown data governance. We also have considerations like privacy and security, so making sure nobody gets their hands on their data. They shouldn't have their hands on their data. Again, that's probably more from an IT and structural consideration, making things, making data in transit encrypted, making sure that there's no backdoors onto our data infrastructure, that people can, can sneak in via. And then there's also things like the data life cycle, making sure we're understanding, documenting, and making that entire data life cycle clear, transparent to the right people, and, and and easy to to adopt, hand over, learn from, and, and and again, comes back to that fixing problems as quickly as they arise. So I promised you a little bit of Maslow's hierarchy of needs here. So let's let's get into a little bit of psychology as we, as we as we brought up this session. So, Abraham Maslow was a humanist psychologist in nineteen forties, and he developed his hierarchy of needs that if anybody is familiar with with Maslow maybe already, aware of. So, we all have the requirement for various elements in our life, or various needs, and these kind of are are hierarchical. At the the base level here, we have things like air, food, and water without which none of us will survive. Right? There's a there's a consistent fundamental, human need. Above that, we have security, health, and family, and that's kind of the second tier of that human need. As we go further up this ladder, we get into things that allow us to get further towards this human full potential, high peak of that, which allows us to be creative, give us kind of sense of purpose, give us a sense of morality. And in between, we have things like friendship and confidence and respect and individualism that again become more, sort of aspirational as we go up the the the, the ladder here. Now importantly, as we go further up that ladder, we never lose a requirement for air, water, and food. And we've kind of attempted to kind of map this as a kinda that that need, hierarchy. We're mapping this onto the kind of governance hierarchy because of one fundamental fact that as you increase and you get further up this, this this, ladder, let's call it, we don't have the ability to ignore the further down steps. Right? We still need to maintain and, and and practice, good kind of governance health. Or in this case, we still need to look after those core human rights as we go up or human human needs, I suppose, as we go up the ladder. So let's map this over onto what we think of as that kind of governance hierarchy. So at the base level here and that level one and level two, I kinda think of these as almost together. Right? So we have things like data acquisition and then safety and security. So data acquisition, accessibility, and availability. This is really saying that we have data that is available to people. When we have data, any amount of data, we probably should be considering the data security of that, especially if it's sensitive, and we might need to be applying those permissions accordingly. Again, as we get larger as an organization, as we get more data driven, then we get we hopefully get further up this ladder. We need to be honest about where we are in this journey, before we progress to that next step and identify the gaps within our data platforms and practices. It's also kind of important here to highlight that we need to continue with the previous levels, as I mentioned, as we progress. So just because we've moved into a data mesh architecture and we're kinda really going big guns on the fact that we want to have data products and data contracts, and we want everybody to be managing their own data, we expect great data coming out of that architecture, which we can touch on later, we still need to keep maintain our ability to have security and permissions, compliance, acquisition, accessibility, availability. All these things still stay true. So these kinda have to run-in parallel in order for us to kind of evolve and progress up to that next step. And not all of these might be relevant for your organization specifically, but, generally, we see that this is the kinda hierarchy of of where we can go to next, essentially. So we're thinking about kind of where we are. We'll ask the question later where you think you might be as an organization where we have, where there we go. Sorry. I'm just kinda saying, where you might be as your as your organization is kind of evolving in this landscape and what what the next thing would be for you to try and tackle in order to move up your governance needs hierarchy. So let's look at each of these levels individually. We're gonna step through each of these sections. And first of all, we're gonna look at that base level, data acquisition, accessibility, and availability. So our base level needs here are the fundamentals of just generally having data that can be utilized by the organization. Imagine everybody on this session, has data somewhere. Right? We have that data hopefully being accessible to people, and that data should be, available to to the the users that are needing to leverage it. So data acquisition is the processing technologies used to collect and ingest data from various ecosystems into your ecosystem. Now that might be you're downloading spreadsheets from public sources. It might be that you've got proprietary data coming from an app that you manage or a health care database that you're maintaining. There's also the ability once we have that data that people can use that data for maybe clinical information or, or for scheduling or for financial, input and and and projections. So this is your accessibility your accessibility here is your your your physical access layer, your physical access methods, your query capabilities, and those retrieval mechanisms. Are they kind of well understood? And we need to make sure that these systems are operational, that they're available, that they don't go down half the time, and that they are kind of reliant. And if they do go down, what happens if we've got redundancy and reliability mechanisms in place? So these are kind of our fundamentals, and we would hope, again, people can have these to some extent ticked off and nailed down. It's kind of the the givens almost sometimes. So we promised some green flags and red flags. Let's talk about the ones for this this core layer. So documentation is not something that is always in place and especially not always in place, equal standard across the board. So documentation, about ingestion processes, especially for critical data sources. Sometimes we forget, I think, in in analytics and data strategy generally that we need to cater for several different kind of, methodologies of of data acquisition. Sometimes data is getting typed into our data warehouse through several steps of transformation and then being fired out into wonderful dashboards. Sometimes we might actually just have a spreadsheet that we've created to look at that project, and and kind of map out the next four weeks of that project in in Excel. These are all important pieces of data, especially for the particular use cases you're looking at. But we would expect probably better documentation when it comes to the former rather than the latter. So not all of these are gonna apply to absolutely every single piece of data that you might hold, but having a documentation structure and and repository of suggest in one place that will document, especially those critical data assets that you have, would very much be recommended. This documentation needs to be clear. And documentation when I think of documentation, I always think of, you know, word documents that are, that have been created and kind of sit there and gather dust for a while. And we can kinda tick the box that says we documented it. That's not the only type of documentation we can have. Right? We might have a, drag and drop ETL process that, gives us that kind of self documenting interface where we can see that it goes through step one, step two. It's not buried in, lots of nested code, for instance. Some of these products do tend to, document themselves as they go, not withstanding the the additional requirement of maybe having some supplementary documentation on top. But it does go some distance to say, if I need to hand this over to somebody else, I can do that quicker than if it was a large script that was running, for instance, with somebody who doesn't have the knowledge to be able to ingest that and understand it quickly. So clarity of, of of your solution is gonna be important generally, and documentation can help that. We want things to be as standardized as possible. In the perfect world, everything, as I mentioned before, would be would be following the same pipeline, the same logic. It's not always feasible for that to happen because we have so many different use cases in the modern world of where we get data from. But if we are, say, for instance, using a particular tool for, visualization, then there's a there's a there's a very useful, a very useful strategy to continue using that rather than having within our twenty person company, we may have four different visualization tools. That wouldn't work. So as you scale out, we want to make sure we have standardized methods for for data access, data visualization, and data ingestion in place as much as we possibly can, which talks to our first red flag, which is fragmentation. So if you have a large organization, especially with lots of different departments, lots of different domains, we might call them in the data sphere, then we might have fragmented solutions with no clear strategy or alignment across them. So we want to ultimately avoid that as much as possible and make sure people within the different departments are kind benefiting from that, economy of scale. So we can use the same tools and we can adopt the same practices and processes, for the various different, arms and legs of the data world that we that we find ourselves in across the different, resource groups. So our second step here on our journey upwards now that we have some data, we want to make sure that it's compliant with, the different compliance regulations potentially is a nice way of thinking about this. So again, things like HIPAA in the US for health care data, very stringent, compliance guidelines, GDPR in the UK and Europe, and ISO twenty seven thousand one are very common, examples that we need to comply with, but it might be different within your organization. You might have or or your your vertical that you're in. Obviously, health care and finance, it talks a lot about when it comes to, data compliance because there's lots of very stringent guidelines that people have to follow their own frameworks. But even with doing your industry, I'm sure you'll know much better, about the specific guidelines that you have to adopt and, and and make sure that you're compliant with in your organization. So permissions and data security are also a part of, of this conversation here. But these are, again, just over those fundamentals, of acquisition. We want to make sure we've got good policies in place so that we don't expose ourselves to theft, data corruption, or any unauthorized access. Again, sounds obvious, but we'll get further up into the interesting one shortly. So permissions, we can do permissions well and we can do permissions badly. We want to ultimately avoid, making kind of ad hoc, permission adjustments based on who you might know in the IT team. Right? Yeah. Go and go and speak to, Jeff or, he'll tell you to reach out to Zandra, and then you'll get those permissions added. That's what we want to avoid. We want to have good policies in place so that, so that we understand what people are gonna need access to, and we're implementing a principle of least privilege a lot of the time. And when we need to get more privilege on those data sources, it's a smooth and relatively frictionless process in order to get there. So again, your green ticks and your red crosses here. So within the green RBAC being role based access controls, modern databases are are fantastic at implementing these sort of access controls so that the right people are grouped together and provided the right policies on their accounts so that when they log in to those data repositories, they have the right ability to do the functional things that they need to do on the data, whether it's, you know, classic CRUD stuff. Right? You'll create, read, update, delete. We want to make sure that the right people can do the right things to the data. And we probably want to automate that as much as possible. Again, we don't want this to be an ad hoc manual effort every single time, which leads to backlogs and it relieves that everything's slowing down and gives us a bottleneck, in our provisioning of data. We also want audits, and we want people to be able to whether they do on a daily or early basis or not. But we want to be able to look back and look at security and assess the vulnerabilities to make sure we understand who's been accessing what, how often. And if there's anything in there that looks a bit suspicious, we might want to make sure that we harden our infrastructure so that we prevent that from happening in the future. At least we understand everything that's happening and we and we understand any, strengths and weaknesses to the architecture that we've put in place. We don't want informality here. Right? We want everything to be going through a shared formal policy network, to make sure data access and sharing is not being done, you know, via email. I'm not sending a spreadsheet to somebody. We know that's very bad in terms of, data governance, data security can be intercepted. It can be forwarded to the wrong person. There's lots of vulnerabilities when it comes to that form of data sharing. Nowadays, we have these architectures where everything should be self contained within the platform, platforms that we we have adopted, and that leads us to having a much more formal approach of how we can provide access to various people centrally. We don't want broad classifications here either, or we don't want overly broad classifications. There's a kind of sweet spot, I think. And again, your, your organization may be slightly different from others, and we wouldn't expect this to be a blanket approach across all organizations. But classifications of data sensitivity being a good example of this, we don't want to say everything isn't sensitive, and we don't want to say everything is as sensitive as possible. So we think of this or at least I do. A nice sort of analogy here might be, you know, like a a a crime drama from the US where maybe there's an FBI agent who's saying, I can't look at that document. It's it's top secret, or maybe it's got unique secret clearance on top of this document. Right? So we kind of have an understanding of government documents in terms of the, of in terms of the sensitivity and the labeling method that we have for that sensitivity. And that might apply to various levels of seniority. Maybe your your director of the FBI has a certain level of clearance, and the intern that just started has a different one. And maybe a different part of the the the government might have, the highest level being slightly lower than that against the the kind of FBI data and so on and so forth. So we can map a policy as to who has access to see what data based on this classification of the particular document that you're looking at. And when we're talking about data, in an organization, for instance, that could be, customized. So PII is a very commonly used, phrase for personally identifying full information. And we can map that to particular columns within a table to say this customer table has an email address, which is personally identifiable. It may have a name, a first name, last name, a date of birth. All of these things might be considered sensitive data, PII data. But it doesn't necessarily mean see you can't use that table at all. There might be some things in there that aren't personally identifiable. Maybe you can count the number of customers in that data without having any identifiable information exposed. So the way we treat our data sources is gonna be related to the way we classify them, and we shouldn't be afraid of having four or five different levels of class those data sources, but we don't want to have two hundred. So, again, there's that kind of sweet spot of how we classify data and how we think about, the the data as it's being ingested and processed. Now we'll talk about implementation of tools, for instance, later. And this is one of those things that I'll touch on shortly that, is really useful to do upfront. So we'll talk about what we could do in the framework of getting better in the world of governance, in kind of the second half of today fairly shortly. But having a bit of this prethought upfront before we maybe purchase some tools to help us with this is kind of mapping out the lay of our hand at the moment and where we want to go to. And some of that doesn't require a a toolkit to be able to do that. It might just require internal processes and, and and people to own those processes and agree upon them. The third section here is your connections. So cataloging, semantics, and COE. Now some of these terms might be new to people. Catalogs are becoming very, popular in the world of governance. And, actually, I would say that often catalogs are kind of assumed to be what governance is all about, especially if you're an analyst. Right? So we've just procured this new governance tool. It might just be a catalog. It might not do anything other than just mapping out all the data assets that you have and suggesting a description, and maybe some lineage in there to say this table comes from these various different underlying tables and and so on and so forth. There is more to governance than just cataloging, but again, it depends on your lens and the whole process. They can be very, very useful tools to map out that inventory as I said. And again, map ownership and do simple things like that. Say Max owns this data source. If If you have any further questions that aren't answered in the documentation that you can see here, then get in touch with Max, and he'll explain more to, more to you. So that can be incredibly valuable and kind of remove some of those friction points, from, from your kind of analytics journey, I suppose. We also have semantics layers. A semantic layer, again, this is probably something we forget so quickly how recent these concepts have kind of been birthed, especially in in our industry. Right? We talk about these new concepts like agentic AI. And we think that we've been around the whole time, but if we went back to January, we might never have heard of these concepts. So semantic models are kind of the way of establishing how data relates, to to to each other. So thinking about, your core database and your core schemas, previously, we have lots of, you know, primary keys, foreign keys. That's kinda less common nowadays because it will impact how how you ingest the data. So sometimes we just have that semantic layer over the top that then just says, this table is related to this table, but then outputs into this table and allows us to draw those lineage diagrams or, DAGs, and allows us to to configure things that might be useful in the long term for AI, especially. And this is becoming much more common when we talk about tooling for semantics. There's interoperability of tools like cube dot JS, for instance, works with everything and API driven and that kind of good stuff. It might also hold things like business definitions of KPIs. Again, how the tables relate, what the join keys are between these tables in particular circumstances, and cascades them centrally to the wider authors of data insights. Within the world of AI, it might host some more metadata in there as well, like synonyms for revenue might also be sales. So if somebody types it into a chat box, show me the sales for last year, it knows to go to this table and get the revenue column from that table. So lots of great stuff that you can put in a semantic layer. And then finally, we've got COE, which stands for, center of excellence. So in this case, this might be, a Tableau COE or a Power BI COE that is distributing information and training materials and license keys and information about how to get Apple desktop installed on your machine. They're kind of owning that entire product or maybe family of products that are you of use in your organization. They might even put workshops on like this one with with us at InterWorks, as part of the pro licenses in order to, increase the kind of learning and knowledge that's available within the community of, of of practice within, within your teams and organizations in a particular discipline. So again, green ticks, red crosses. Green ticks here would be having centralized glossaries, catalogs of terms, lexicons, if you will. There's lots of ways of kinda talking about what this means. But if you think of especially if you've been in in an organization we had a conversation this morning about some of the things that we refer to at InterWorks, whether we're talking about training or enablement or however we kind of like to put in these little euphemisms quite quite quite regularly. How is that being done in your organization? And what how do we centralize what we're terming as a particular thing? In the world of data, that might be as simple as what is a customer. When we say we're counting the number of customers in this chart, what is the definition of a customer? Is it widely agreed across the organization? Is it a subscriber to a particular product? Is it someone who's bought something in the last two years? What what is what is the definition? Is it someone that signed up to the mailing list? Probably not because they may maybe bought something. So these can be very, very useful to to write down in the central location and then tag them in the different data assets that use those concepts. Data lineage is also a great step, especially for data engineering teams, that might need to know very quickly what the impact of changing a particular table is. Does it is it a core table that affects a hundred different reports? Is it just used by Vicky in in marketing And we can, you know, delete it without much, without without much impact. Yeah. That's so that's where data lineage comes in. We also want to avoid having silos here. So we don't want shared understanding of key business terms or, sorry. We don't want to not have. We want to not have, a a a siloed understanding of what these terms mean. We want everybody to be singing on the same hand sheet, when it comes to, the the processes, the terms that we're using, and that makes everything easier when we get to kind of, rolling out a government strategy or change to our policy because we know that we're talking about the same things, across the organization. And, ultimately, we want to have a consensus on what those things mean. So we want to, again, avoid having those silos, avoid a lack of consensus, and make sure that everybody is on the same page when it comes to the resources that they have to access more information. And, ultimately, we'll get through these quicker. Self-service. So thinking about metadata, data mesh, and data products. Now data mesh is an architecture that has, kinda popped up probably in the last three, four years. And this is something that allows us to, particularly in larger organizations, we might have the requirement for various departments, business areas, or domains as they're called, to maintain their own data. This is, used to be called data siloing. Right? Now it's actually being kind of restructured so that you don't necessarily have to pipe everything into the same database. You might want to have good, data products owned by their own domains, and they all sign up at the start of this architecture to put in, put in, data contracts so that we can say that this is the format of the data. It's gonna not change in the next two months so you can build those dashboards. Maybe we require a certain number of hours downtime as a maximum a year, for instance. So there's lots of things that you can put inside those contracts to allow for that data mesh to be feasible within your organization. Standards and practices are gonna be be defined and created, and a shared, wider audience is gonna then be able to benefit from those tools. Think of the NHS, for example. There there might be, there might not be one owner of all the data, in the NHS. Right? They're probably gonna be kind of democratized across various different ICBs, sub organizations, national datasets, for instance. So we can all fit into a central framework, in an ideal world, and that ownership can be mapped across. And we understand the policies that are in place and the contracts in place for those to be reliable for everybody else. Metadata in this circumstance might be systematic collection in organization. We might also be thinking of usage metrics. So how often are is the data being collected into that table? Who's accessing the charts, and, and how often are they viewing them? So we do want to map ownership here. That seems very fundamental, but, it's, definitely something we want to make sure is in place. And we want those owners to be accountable. And, again, that's can sometimes be, a bit of a bad bad word in the the the organizational lexicon. So accountability and ownership, these things are things you might want to push away from. Don't have time. Don't do it. But it's really fundamental for good governance. It's making sure that you know who is gonna be ultimately responsible for, a particular dataset or asset. We don't want to have a lack of domain level expertise. So some teams are gonna be really technical and some teams are gonna be less technical. So we want to make sure that we have a good spread of technical expertise across those teams if they're managing and maintaining their own data. And we want to avoid bottlenecks, especially when it comes to provisioning access, for instance, or, making sure that we're if we're changing a data source or we're making an update or including some more data, we know where to go and everybody, again, is on the same page there. We don't want to introduce any unnecessary friction when it comes to our governance process. And the keyword there might be unnecessary. Sometimes the if we do want to have good policies around things, we might need to fill in a web form in order to get access and structure it and track it and audit it. But we want those to be as smooth, as they possibly can be as processes. And then finally, we're getting fully aspirational. Hopefully not. Hopefully this is still achievable, but we might want to think about automation contracts and AI driven, analysis and and even further than analysis. So think about AI driven to begin with because it's such a big talking point, obviously. We might want to use AI to spot mistakes in the data. So things like observability, for instance. We might want to recommend policies or even parse the data, provide some default documentation based on parsing that data to say, this is all of our customers, the way we've defined them as customers, and then people can then add to that documentation, but it doesn't take them two hours to write. No. It takes ten minutes, and they just fill in the gaps. Data contracts, we've talked about in terms of data mesh especially. This is something where we encourage accountability and participation down to the individual level and business units and how they're going to be using the data, in formal agreements. And then working backwards, automation, we want to make sure we're, automating as much as possible for two reasons. One is that we don't want to necessarily be spending all our time in manual processes and, and and doing things that is, somewhat robotic and can take up a lot of our time. The second part of that is that we don't want to be making mistakes based on human error because they are prone to human error. Trust me, I've made enough of them. So green ticks, red ticks. We want to red crosses. We want to enforce policies, automated policies enforcement, and compliance monitoring to make sure that we're not making more mistakes as we go. Again, sensitive secret, top secret. These are gonna be driving permissions. And we want this to be a formal process as much as we possibly can, especially for those consistent data sources, that we're that we're using and are critical to our business. What we don't want is too much of the manual interventions, so the opposite of the automation. And we don't want that kind of lack of vision. We want to bring everybody on board here and explain why we might be making changes to data governance, initiatives and get them on board and get them on board early so they can be part of the journey. So our second poll for today, let's think about, what where you are, I suppose, as an organization. So thinking about those five levels, what level do you think you're at? Are you very early? Are you kind of almost reaching that full potential? Are you at your full potential? What slide slide of that scale are you on? Let me put it back on screen here just so that you remember what those are. Most people are sitting around that. Number three. The The third one the middle the middle one seems to always be the most, most common. The safe one, Max, I think. That way. Yeah. Lukewarm. Excellent. Okay. Some people have full potential. Love that. Brilliant. Okay. Should we end the polls? Everyone finished? Yeah. I'll end the poll, and I will share the results. Share the results. Go for it. So we actually see quite a kind of, I guess, a southerly skew in the context of the hierarchy. It would be a northerly skew there. So we so we have most people sitting further towards the top of that than I would expect. So yeah. Amazing. So we've got our data needs met. Right? If people were still at the base level needs and they didn't have, you know, permissions on stuff, then then, yeah, we we would expect that to be a bit rarer. So connection being the most common. So we understand that there's still more work that we can be doing here, but we're not necessarily there yet. So, yeah, hence joining the session. Good stuff. So let's talk about a framework for success. We've got about fifteen minutes left to talk about how we get to how we get to a better place. We want to kind of take this further and implement a framework to enhance our data governance throughout our organization. We want to kinda continue to think about data and governance again in the context of of that, that ladder. We still want to be doing stuff that's lower down, but we want to be growing into this stuff that's that's further up that ladder. Again, we will talk about the fact that we there's a significant amount of work to potentially do upfront before we considering onboarding tools to help. Software vendors will tell you this tool will will will fix everything or or some of them will. Maybe a lot of them will be a bit more honest. But we, when we think about purchasing tools, maybe we're talking about tools like Atlanta, data dot world, data hub in the kind of governance world, or even Monte Carlo for for observability. We want to do quite a lot of this prework upfront to make sure that we've got the right definitions in place, and we want to kind of align with those green dots rather than the red crosses, before we even start configuring these tools or welcoming users onboard. Want to make sure that they're set up properly to begin with. If you do want more help on strategy here, then please feel free to get in touch, and we can, and we can walk through these frameworks and how they apply to your organization in more detail. So I'm gonna start off with a we're gonna we're gonna have a bit of a a theme, of looking at a fake organization, you'll be surprised to hear, run by my chicken, Josie, who is the chief executive officer. So she's looking to pay her way in my garden. She's actually out there now sunbathing. And she's looking to pay her way in food and bedding by succeeding as an entrepreneur. And a few years ago, Josie started Clockworks. Clockworks are a fictional, egg production company. Although they don't boast having kinda the best the best people, the best workers, they're not the brightest. They do have a lot of data, that the other chickens are desperate to kinda peck at. They keep their overheads really low. The things that they produce, primarily eggs, are cheap for them to do so. And they're, yeah, they're looking for investors. So how does, how does data governance apply to an organization like Clockworks where we have a few different stakeholders here like Josie. We may have an HR director, a chief information officer, a director of analytics, and then they have the teams underneath each of those, different practices. So we might want to step through those six processes of those six six bullets that we talked about previously. We're gonna start with meeting with key stakeholders and enable multi domain collaboration and buy in, make sure everybody's on the same page, opportunity to gather requirements. I want to take that and ensure the data governance policies meet an an organ any organization goals and standards and agree on what those standards are. And then we want to set some goals. We're gonna be thinking about that in a second, and those goals should be SMART, s m a r t, right, the acronym, which is specific, measurable, achievable, realistic, and time based. Some people have different letters in there, but that's that's the gist. So to think about some of these goals we might have, so Josie might have the goal of wanting to increase outreach, and efficacy of marketing campaigns. She's, aligned that as as as being a top priority for her. The HR director wants increased PII privacy, and the CIO might want to increase data security. Data director of analytics might want analysts to know when their reports are delayed. So they all have these kind of gaps in their current solution, and they might want to, improve those and fix and and plug those holes. So let's take those goals and make them smart. So if we think about the time based nature of this, and being very a bit more specific as to what does it mean to get better in each of these areas. Well, Justin might want twenty percent higher open rate and fifteen percent increased conversion rate within the year. So that within the year might be twenty twenty five in this case. Our HR director, what I mean, might want a hundred percent of all the fields that have been flagged PII to be private within the year and not feed through into ultimate analytics or or be shared externally. We might also want the CIO, from a CIO perspective to have role based access controls and encryption in place for each and every data object. And the analytics, lead might want day of notifications for data delays deployed, along with those pipelines. So if something's not ready, they want to be proactively, informed, in a in a specific way. So once we've got our goals, we want to define the roles on top of those. So governance roles, are gonna allow us to, make sure that we're democratizing a lot of this process and we know who is owning what and how they're doing that. Clarify responsibilities and accountabilities. And, it doesn't need to all land on one person or, you know, the head of data governance that might not exist. It can be a multi it should be a multi practice collaboration because it affects everybody. We kinda have three layers of this. These are kinda general rules for data governance. And when we see the chicken, we're moving back to to Clockworks here. They're widely available publicly, these three rules. We've not kinda created them as part of this this conversation, but we might have a governance committee, which which we're gonna talk about here, data owners, and data stewards. And these will probably resonate with most of the collateral available online. Think of it like a pyramid. So you've got your data governance committee at the top. These are kind of the steering committee. Underneath these, you might have specific data owners of different, different disciplines or domains. And then underneath them, you might have the data stewards, which are very much the hands on keyboard people responsible for these to to work, and make any changes to them. So we're gonna step through all of these, three layers starting with the governance committee who are responsible for creation and maintaining of that data governance framework. These would be people that we want to bring in early if we were looking to make a change or kind of invest more in the governance of our data at our organization. The senior vice president of marketing might be somebody on this committee. They might want to ensure the data governance policies are properly implemented across the department or wider, and have an impact on the policies that are being, are are being kind of spread around the organization. And they might want to resolve the conflicts on access to campaign data, for instance, and regularly review performance of the data governance program. Our data owners, as I said, are responsible for the decisions around their data domain, and will be feeding up onto that steering committee and down to the the data stewards. So our senior marketing executive, works with governance committee to define the policies for campaign strategies, marketing outreach, why it might be slightly different from financial data, for instance, and how it maps onto their part of the organization, and works with data stewards to ensure that it's properly implemented. The final one is the data steward. Again, this is kind of hands on keyboard. This might be someone who's creating your Tableau published data sources or potentially, analytics views that you might have in Snowflake or your data warehouse. So marketing analyst might, might might fill some of this role, ensuring campaign data is accurate and consistent and feeding that back if something further upstream is happening, that is affecting that. They might maintain the the marketing data dictionary or lexicon or or or or glossary of terms, and the object relationship definitions, how those those different tables interact and the metadata therein. They might also educate data consumers on proper campaign data usage and they'll be responsible for the documentation, that is dictated in terms of format from the, the the the steering committee. So this might be how it maps out. Again, these are these are not necessarily, accurate for for your organization. This will change. But this is an example org chart of, of this made up organization. And these chickens all have responsibilities in line with the vertical tier here. The hierarchy between data owners and the committee is not necessarily one of seniority. We might see we have the kind of chief marketing officer that's a data owner, but not necessarily on the steering committee. We might have the vice president on the steering committee. Might be a slightly worse set of that, but just because you're the lead of your department doesn't necessarily mean that you have to be on the steering committee. So, yeah, as we get down the table here, we have the data stewards, which tend to be data managers, administrators, data product owners, and, and and so on and so forth. So there is a correlation maybe with seniority, but not necessarily in the top two elements there. Policies and procedures. When we're defining policies, we want to, relate them to data objects and assets. So these should be well documented to reduce confusion, rather than creating that confusion, and we want to be conducting periodic, compliance audits. So with our goal for, for Clockworks, marketing might want twenty percent higher open rate or fifteen percent increased conversion rate. This might be data validation, on ingestion required fields for CDP matching for, customer data platform. And we might have data cleansing and fixing the typos, in order to make the most out of the data that we have there. In HR, we might want a hundred percent of the PII fields to be private. That might, involve tagging those PII fields, and then that gives us a kind of target. Once we know that they're PII, then we can say all hundred percent of those fields that have been tagged are then masked, as well as providing some training awareness, not only as to what those mass fields are gonna return as and what to expect and why, but also, yeah, the the the benefit, of of why we're implementing such a step in the organization. The CIO as well might be implementing that RBAC and encryption at rest. This is something that is kind of more probably project delivery. It's either done or it isn't. So they can prioritize potentially setting up, quarterly access audits and monitoring and data security so that we've got things standardized, within, within our secure environment. And the encryption at rest and in transit could be something that we can have maybe natively in the tool, and maybe that's, if it's a SaaS product, that's maybe coming natively. If it's something that we're doing on our own on premise environment, we might need to be working, with with dedicated IC IT resource to make sure that the, that the the disk that that's been stored on have been properly encrypted. Analytics want the day of notifications for data delays. Again, these these policies can be implemented in several different ways. And in the interest of time, I'll move past the ways that we could actually be handling those. Now tools are, again, we've talked about tools a few different times. They're not always necessary here, and they come in all different shapes and sizes with lots of features that may or may not be useful for your organization. You might be doing half of them already. Right? But these tools also don't need to be one single tool to rule them all. They could be, a suite of, you know, interoperable tools that, might do something very specific for a small use case. And, again, we've got to balance that out with the, the fragmentation conversation that we had earlier. So making sure that you have the, the, the smoothness of having an interoperable tool set, and we also know what's already available in the organization. We talked about Monte Carlo previously. Maybe we've got Terraform that's kind of spinning out and deploying our role based access controls. And PII is being managed by Alation, which is, again, one of these more kind of classic governance cataloging tools that might help us expose where we've got sensitive data. And then we want to track these metrics. We want to measure them. We want to make sure that we're continuing to measure them, and we're, implementing, any changes that might, occur as actions out of the back of that measurement process. Within marketing, again, we've got that data profiling, to identify duplicates to clean up that data. We've also got audits of the unmatched PII data from the HR perspective, and we're monitoring any ad hoc permissions from CIO's perspective, that might have been added by a particular person to flag that up and say why why is that being added. Added. And data observability in terms of the downtime percentages, and incident response time is being tracked as we go there as well. Then we want to iterate on these processes. We want to make sure we're continually feeding that back up to the committee, the steering group, and, establishing whether or not these are being a success. Is there any continued training that we might need, and, have have things been actioned appropriately? We're not necessarily expecting a perfect continue to change our thought process and maybe level those up gradually as we go. Examples of these might be, again, training. We might want to, have regular committee meetings and and discuss them consistently. We don't want these data policies to be too rigid nor too, nor too, lax because then we end up being further down in that ladder. So some common challenges just to finish off here. We've got maybe three or four minutes left. Internal reluctance can be a challenge, clashing with current procedures, trying to be perfect from day one, and over engineering. So let's have a look at this within the context of some memes that we've pulled off the Internet. So internal reluctance, this is reluctance. The why and the how. We want to not just expect everybody to be willingly adopting these processes and red tape it up, essentially, that that this can be viewed as from from, from day from day one. We're not expecting that. You want to slowly grow this, as as you go. You want to strive for, for for better in every single iteration. Challenges might be also clashing with the current state. So lots of companies in this space may prioritize interoperability with other tools. We want to make sure that we're not managing these lists of assets in a massive spreadsheet somewhere. So we want to leverage these tools that can interact with your various different, prod products that you might have within your, ecosystem. And POCs are a great way to to try out these different products that are emerging. We also want to, you know, don't let perfect be the enemy of the good, I think, is a is a is a phrase here. If your policies are too strict, ensure that you've got that feedback loop in place. We want to find that balance, and we want to iterate on that balance to improve our culture of, of of data governance as we grow as an organization. And we don't want to over engineer this, thinking about the Maslow scale. Our base needs and goals are gonna be really important, and everybody's path is different. Not a one size fits all approach here, which is why we recommend reaching out to ourselves if you want more help here, or if you want to fully understand the the context of the the industry, and how you can kinda continue down your own path, when it comes to, data governance. And with that beautiful segue, this is the end of the path for today. So thanks very much. Hopefully, this has been food for thought for you, to think about where you might be within your organization. And, hopefully, you don't have too many of those red flags we discussed today, and you're keen to get started on creating some of those green flags, green ticks in order to get to the higher tier of, of Maslow's ladder. Thanks again for your attention. I hope we speak again, and please get in touch if you want to discuss this further.

In the video, Max, the Services Lead at InterWorks, discusses the complexities of data governance, highlighting both green and red flags organizations face. He emphasizes the importance of a structured framework, mapping governance needs to Maslow’s hierarchy, and the necessity of clear documentation and role-based access controls. The conversation also touches on the significance of stakeholder engagement, automation, and the democratization of data ownership to enhance governance efficiency. Max illustrates these concepts using a fictional example, Clockworks, to demonstrate effective governance practices. Ultimately, he encourages organizations to adopt a tailored approach to governance, ensuring clarity and accountability.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!