Packaging data in the form of data products greatly increases the value and flexibility of your data assets. New business functionality can be delivered up to 90% faster and overall TCO of your data assets can be reduced by 30% (ref 1). Limiting data sharing to well-defined data products can make data governance processes more transparent and simpler to enforce. So, if data products are so great, why isn’t everyone doing it already?
Not long ago, people spoke of data as “the new oil.” Oil is valuable, but it’s a commodity. While there are different grades of crude oil depending where it was extracted, one barrel of oil is still pretty much the same as any other. Data isn’t like that. Individual datasets are very different from each other in terms of their value, sensitivity and how they are consumed.
Instead of thinking of data as a homogenous commodity like a barrel of oil, it makes much more sense to think of data as a product, like a tin of baked beans or a tub of paint. To make data most accessible for data consumers, it needs to be packaged in a way that is convenient to handle and clearly labelled so they can tell if they’re dealing with beans or paint.
What Makes Up a Data Product?
A set of clean, curated data packaged together with metadata which describes:
- What the data represents
- Where it comes from
- How often it is updated
- Example usage
- Legal or licensing restrictions
- Who to contact to get more information about the data
Making this kind of information easily available widens the audience for the data and makes it easier to use and reuse. Cleaning and reformatting the data into an easy to digest format shields data consumers from the underlying complexities of data sources.
When new business requirements come along, new data products can be created by combining existing data products rather than having to build from scratch every time. This has been shown to allow delivery of new business use cases up to 90% faster (ref 1).
Easily discoverable data products can highlight situations where different parts of an organisation are duplicating effort, processing similar raw data in a similar way.
Feedback and accountability mechanisms bring a renewed focus on data quality which can then lead to process improvements and improved quality over time.
Where data is packaged as a product, access to that data can be monitored and cross-charged to consumers, allowing a marketplace of data products. In large and complex organisations, a market mechanism can provide an efficient way to identify value and allocate resources appropriately.
Improved data discoverability makes the data available to a broader set of potential consumers. Improved data quality leads to a greater level of trust in that data. Trust is needed the most when the data suggests something unexpected. The most valuable insights can be those that are counter-intuitive.
Packaging data in the form of data products has been shown to reduce the total cost of ownership for data assets, including technology, development and maintenance by as much as 30% (ref 2).
Why Doesn’t This Happen Everywhere?
Lack of a Central Data Catalogue
There are lots of places where the kind of meta information described might exist. It may be distributed across several project documentation files. Sometimes important information may only exist within the comments attached to program code. There’s often no central place that a data consumer can go to to to browse through the different types of data that are available to them.
No Funded Path from Data to Product
Creating data catalogue entries, cleaning data and maintaining SLAs incurs significant additional costs for the data product producer. If a data product is required for regulatory compliance then that cost can be borne centrally. Outside of what is legally required, there must be a potential source of revenue to offset those costs. If there is no internal marketplace or cross-charging mechanism for data products, it makes little sense from the point of an individual department to incur these additional costs, even if some of the data they handle might greatly benefit the wider organisation.
Security and Compliance Doubts
The costs of illegal or inappropriate sharing of data can be great. Where data is not well documented, there can be uncertainty about whether some piece of data should be shared more widely. For employees to feel safe sharing data, there must be clearly accessible guidance for each data product that specifies whether and how it can be legitimately shared. This kind of information can and should be documented as part of the data catalogue entry for any dataset.
Sharing data via data products and centrally provided data catalogue does not mean a free-for-all. Data security can still be applied so that sensitive data can be restricted to those with a legitimate reason to access it. Data products make it easier to enforce some of the quite complicated rules that can exist around data sharing. A central data catalogue can make it clear to anyone wishing to use the data just what restrictions apply, including restrictions on how the data may be used.
Reverse Dunning Kruger
In a situation where data is produced, managed and consumed within a single department, those who use that data to make decisions are likely to be working closely with those that produce it and so will have a good understanding of the meaning as well as the quirks and limitations of that data. Where that data is shared more widely, it can be easy for those who understand that data to overestimate the degree to which that data could be readily understood by others. It’s hard sometimes to know what others don’t know.
Organisational Change Takes Time and Effort
A data marketplace that brings about reallocation of resources within an organisation is bound to create some organisational friction. Data teams that are used to working together may find themselves seconded to individual departments and adhering to new departmental priorities. Business changes that involve new ways of working and changes to allocation of resources are always going to be trickier to manage than changes that are purely technological.
So, What to Do?
Find Some Seed Funding to Establish a Pilot Product
Showing value early on will help justify further investment and the establishment of a marketplace with recharging mechanisms. There will still need to be some initial investment to build the first pilot products and demonstrate value. Subscription-based cloud services eliminate the need for large capital investments up front. The business case still needs to be made, but this can be built upon internal successes rather than requiring an initial leap of faith.
Establish Some Core Infrastructure
A centralised data catalogue is the first step, so your data consumers know where to go if they are looking for data.
Review Your Existing Data Assets
Most data projects start with a requirements gathering exercise, identifying a problem and then designing a solution to solve it. Productising your data is different. In this case, you already have a substantial number of data assets. The goal is to get more value out of those existing assets by ensuring that they meet minimum quality requirements and are sufficiently well documented and advertised that they can be discovered, picked up and used by someone unfamiliar with them.
Master data can be good place to start. A data product that can provide information about your customers or suppliers across the business in a well documented form can be of help to many different parts of an organisation. Data licensed from outside the organisation may also be a good candidate. Depending on the licensing restrictions, externally sourced data that is properly advertised internally may offer value to departments that didn’t previously have access to it.
Managing and sharing data in the form of data products has value on it’s own. It can also form part of a larger strategy to manage data within a data mesh.
At InterWorks, we can help you with every step of the journey, from raw data to finished data products, on-premise data into the cloud, and centralised data warehouse to distributed data mesh. Contact us for an informal chat.
About the Author
Mike Oldroyd has worked in data and business intelligence for over 20 years. In 2023, he was the lead data architect, designing and implementing a data mesh architecture for a major, global client.