In the last post, I provided guidelines on security and preventing unauthorized access to your data. I warned you that an overly restrictive security model could encourage bad behavior.
BI system security is like my wife’s purse. I might want to look inside, but I wouldn’t dare to without her permission, and if I did, I wouldn’t be able to find anything without her help. Data governance is akin to getting her help locating the car keys in her purse.
This section discusses data governance and how to provide a framework for making data discovery easier without being paternalistic. Create conditions that cause people to work within the rules because they understand how the rules make finding data more straightforward.
Data governance should focus on making data easy to find and determining the appropriate level of auditing your outputs require. In the right hands, unaudited data can be helpful. The most timely information is typically unaudited because it may pertain to ad hoc or emergent circumstances.
Provide easy-to-see data status (audited or unaudited) as part of your design best practices for dashboards. Define table and field naming conventions that make your data warehouse more accessible. Clearly define responsibility for using unaudited sources. File names and field labels should be in understandable language. Use descriptions that new employees can understand.
Devise policies that are conscious of the need for accuracy but also have an awareness of urgent user needs, the limited time-value of some data, and the urgency of needs. Allow for the following categories of output:
Reporting provides information releases, in standardized formats, on a predictable cadence (daily, weekly, monthly, quarterly or annually). The two general types of reporting are mission-critical and ad hoc.
- Mission-critical reports are consumed by many people in your organization. Or they are consumed by people outside of your organization. Mission-critical reports must be vetted and audited or required by law.
- Ad hoc reports are used for limited durations by fewer people. These reports focus on a specific issue or opportunity with time-sensitive value for a few users. However, for those few, it may be critical information that may facilitate better decisions.
People using unaudited, ad hoc reports should understand the potential pitfalls they might contain. Timeliness takes precedence over airtight accuracy. The responsibility for appropriate use falls on the information consumer.
Everyone understands that mission-critical reports must be audited and approved before releasing them for distribution, but set time limit targets for this process. Measure actual performance, and share the results objectively with everyone using the reports. If there are consistent lags, deal with the causes and communicate what is being done to remedy delays.
Interactive dashboard reports should contain more detailed data views than those exposed initially. Dashboard interactions must allow end users to filter, select, highlight outliers and explain potential causes. This interaction is the primary value provided by dashboard software.
The analysis begins when a user looks at the dashboard (report) and seeks to understand trends and outliers. Why is the number bigger or smaller than expected? Why did this happen? The information consumer must be able to interrogate the data without writing code. This way, interactive dashboards enable more profound insight into why, how, who, what and when.
Suppose the data source used for the dashboard is not vetted and approved. The author of the dashboard must disclose how the data was assembled and the data sources used. Best practices for dashboard design dictate full disclosure of the data sources and contact information for the work’s author.
When digging into detailed data without a preconceived idea for a report, I refer to this as discovery—using many different views of the data. Working to identify patterns and interesting trends—trying to identify outliers that may surface something actionable to improve decision-making, identify opportunities, or resolve problems. Frequently these problems are not visible in an existing report. Encourage all your employees to develop discovery skills, especially analysts. Provide training to those who want to learn best practices.
Discovery work may come from vetted and un-vetted sources. As always, the author of the discovery analysis should identify all sources used and explain potential shortcomings of the data. Discovery analysis does not rise to the level that requires formalized review and approval before being disclosed. The value may be fleeting. Quick action may be necessary.
Development and Experimentation
Suppose the discovery work proves to have high utility and permanent value to many users. In that case, your process should accommodate moving these high-value reports through your approval process to become audited reports without interrupting the use of the ad hoc report. When the audited version is ready, sunset the unaudited version and replace it with the new, audited report.
Every BI environment produces many more temporary, ad hoc dashboards compared to the number of audited reports. Well-designed BI systems anticipate this reality.
Most analysis is of temporary value and has limited time value. Ad hoc analysis typically supports a small number of people with particular needs. Auditing this analysis is unnecessary, time-consuming, and costly. Set reasonable guidelines for determining when an unaudited report rises to usage levels or importance that trigger auditing.
Make it easy for your analysts to test new software releases or brand-new tools by providing a “playground” environment for this work. The output from these activities should be viewed only by those with access to this playground. Creating separate environments for safe experimentation and error-proofing prevents buggy software from getting released before it is ready.
A Framework for Data Governance
We identified the need for three environments:
- Vetted production environment
- Un-vetted ad hoc environment
- Un-vetted test environment
The vetted production environment will have the smallest number of dashboards because the approval process for your interactive dashboard and data sources requires time and resources. Direct only the most mission-critical reporting and analysis through your auditing process.
Your ad hoc environment will be where most content lives. Content will be internal reports with limited distributions. Suppose a significant number of information consumers start using these unaudited reports. As mentioned, design your auditing process to accommodate report status promotion without limiting access to the unaudited reports until after the audited versions are released.
The test environment is for experimenting with software or data releases. Before upgrading your production and ad hoc environments, these tests must be completed and approved. Allow access only to software testers. The primary purpose is efficacy and performance testing before release into production. This environment should be limited to staff tasked with testing new software releases and dashboard designs that will move to the vetted production environment.
Data Lineage and Data Provenance
Two areas related to data governance are lineage and provenance. They are important considerations for developing a solid data governance system. Prevent the proliferation of redundant data sources and dashboards that consume storage capacity by establishing governance processes and tracking data lineage.
Lineage allows you to backtrack to the data sources you view from the endpoint (dashboard). Some dashboard environments include a built-in data-lineage view capability. Being able to view data lineage engenders confidence. It provides the information consumer with a way to see where the data comes from. This ability reduces redundancy.
Provenance is a concept I’ve borrowed from the art world. Experts in original artworks of significant value go through a thorough validation process to prove the piece’s provenance. Webster defines provenance as “The history of ownership of a valued object or work of art or literature.”
Data provenance is the history of ownership of the data workflow used to produce the data in the dashboard. Do you have erroneous or missing data affecting a report? In that case, it’s crucial to identify the workflow owner that feeds that data so that they can correct the cause. Provenance provides a feedback loop to the workflow owner. It provides the basis for effective measurement and process improvement.
A final word on governance and lineage. Companies lacking data governance and lineage systems have these attributes:
- Redundant data sources
- Redundant dashboards
- Less confidence in the system due to inconsistency
- Excessive maintenance and software costs
- Out-of-date software
I call these “wild west” environments. They are frustrating places to work. Even with a massive proliferation of content and users, they should receive more value from their BI systems than they do.
Some people like the idea of entirely decentralized resources and control. You can deploy quickly (with less oversight). But this causes headaches two or three years down the road.
Provide shared software environments with separate publishing areas for teams to quickly find what they need. Establish consistent governance policies. Please ensure employees can easily use solid but unburdensome security systems without using workarounds to avoid long delays. Security must remain responsive. Avoid the wild west by establishing helpful governance policies and reliable data lineage tracking.
Next week, in Part III of this series, we will discuss how to maintain your systems to ensure they perform well and remain secure over the long term.