Optimizing Operations with a Data Product Runbook
Adrian Estala
VP, Field Chief Data Officer
Starburst
Srinivas Paluri
Senior Director, Data Engineering
Zillow
Adrian Estala
VP, Field Chief Data Officer
Starburst
Srinivas Paluri
Senior Director, Data Engineering
Zillow
Share
More deployment options
As we begin to see greater maturity across the Data Mesh designs that early adopters have put in place, we can start to better appreciate how to measure data product efficacy. What does a good product look like? Sure, you have read about the basic principles of data products; they should be secure, interoperable, shareable, discoverable, and easily understood. These form a great foundation, but to measure ongoing effectiveness, we need to manage business value and operating efficiency.
Once we define how we are going to measure effectiveness, then we can start to design the operating models that will drive intended metrics. You will need standard procedures that are described in a runbook to help guide the full data product life cycle, from development to consumption. In this post, we will cover some of the basic elements of a data product runbook, and we look forward to your comments and ideas on how to improve this.
In a Data Mesh, we are trying to create a low-friction environment for the consumer. Keep your runbook simple and fit for purpose. Impress the consumer, satisfy the auditor.
In this blog, Srini and I will cover some of the key topics we discussed in Episode 11 of Data Mesh TV, which you can watch here. We have also added a lot more content that we were not able to fully delve into on the show.
The Runbook Framework
A data product runbook establishes a set of standard procedures that can be quickly understood and implemented. If you have a collaboration tool like Confluent, you can consider using this basic framework:
- Start with a Runbook template. The template defines all of the required sections. Every Domain will likely operate under slightly different governance standards, but the template provides you an opportunity to define the “non-negotiables.” There should be standard parts to the template, and guidance for how each Domain can configure the areas that are flexible.
- Define a Release and Change Process. This does not need to be overly governed. Keep it simple. If you have a Data Mesh Governance board, they should review this as part of the setup for any new Domain. If you don’t, then you can rely on a peer review process to help share ideas and lessons learned. For higher-risk domains, including infosec in the review process is a good idea.
- Make them accessible. Producers and consumers should be able to easily find and read these documents. Both parties can provide helpful input to drive greater efficiency and value.
- Define Accountability. The data product owner should own their runbooks, but this can vary depending on how your accountabilities are established. Ownership is critical and should be formally established.
- Link related materials. Include links to Domain Runbooks and training materials, and encourage feedback.
The Anatomy of a Data Product Runbook
As noted above, in an enterprise Data Mesh design, the runbook structure should be largely consistent, but it may include some custom features for each Domain. There are many different ways to build a data product, and it is reasonable that any Domain could leverage more than one data product platform. In this short blog, we cannot dissect every detailed approach, so we will focus on describing the topics that your runbook should cover.
If your data products require heavy lifting by data engineers or heavy data migrations, your runbook is going to be materially more complicated. These types of data products should be the exceptions for most organizations. You are sacrificing efficiency and consumer value using this approach for all products.
Build 2- to 3-page runbooks that are written in a way that a consumer can understand. Then, motivate consumers to build their own, eventually.
Your runbook should include:
Data Product Lifecycle
Create a flowchart to illustrate the life cycle across these key steps: discovery, design, test, publish, operate, consume, change. Data product development should be easy and fast; your goal is to create a low-friction approach.
Metadata Requirements
Rich metadata that describes data products makes them valuable to the consumer. Establish strict, enterprise-wide requirements for the elements that every data product should include (e.g. purpose, sources, owner, usage guidance, data quality, version).
Consumption Guidelines
Data products will be consumed in different ways, and we all want to build truly interoperable products. Most of us are not there yet. Your data products will mostly likely be focused on a couple of consumption patterns. Make sure the data products are designed in a way that will be successfully consumed.
Accountability
For each of the lifecycle steps, define clear accountabilities for the producers and/or the consumers. Keep this simple: the fewer people involved in the lifecycle, the faster it will run.
Data Access
Establish the process for requesting data access at the source for the Domain and for requesting data product access for the consumer. Include a reference to your data contracts that establish the responsibilities for protecting data.
Auditability
Include a statement that ensures that all producers and consumers know that all use of the data product will be tracked. Logs will capture when it was queried from the source, how it was processed, and who consumed it.
Data Governance
Provide links to enterprise guidance for data classification, master data management and data quality. Provide content in the runbook that is specific to the data products you are building. Describe the type of data that is being used, where it comes from, and the guidelines for how to protect it.
Release Management
Establish the process for managing changes to data products, and consider a more formal release management approach for Domains where necessary. The fewer the steps, the faster the process will run, so you will need to be ruthless about keeping the process simple. Equally, you should not sacrifice security where risks are higher.
Getting Started
As a Data Mesh community, we should share our ideas and examples to help each other grow. data products do provide a competitive advantage, and it’s understandable that some organizations may choose to keep their “leading” examples internal. If you would like some additional ideas, work with your favorite Data Mesh consultant or reach out to Starburst and we can share our own ideas in a data product design workshop.
Data Products
Enable data producers and consumers to create, publish, discover, and manage curated datasets