Build great data products and reduce cognitive load
Share
More deployment options
Regular readers of our blog know that many businesses have found success managing data as a product. In essence, they have learned how to govern, exploit, and optimize value with data products. Skeptics are probably wondering, just by how much? According to some accounts, by managing data as a product, new business use cases can be delivered as much as 90 percent faster.
Those who are interested in the finer details of data products, you’re at the right place. In this two-part series, I share what I’ve learned working with organizations as they have adopted Data Mesh and data products over the past two years. In part one of my blog series on data products, we focused on why data products are necessary, the three kinds of data products, and how to govern and manage them. Finally, in this post, the focus will be around organizational structures that are responsible for building data products.
Who builds Data Products?
We know from Zhamak’s book, domains are responsible for building and managing data products in a Data Mesh. In practice, organizations structuring and building domains tend to map to groups of people within ‘lines of business’, with the domain owner role mapping to the leader, or senior stakeholders within their respective area.
Encouraging domains to create data products requires motivation and incentivization approaches, and it also requires skilled resources available to the domains to support this endeavor. These skills are likely to come from the central data team within the organization, leading to a potentially significant change in the organizational structure, and in theory, the end of the central data team.
However, the end of the central data team is not quite what we have seen just yet. While the central data team will inevitably shrink as data engineers, architects and modelers get subsumed into the domains to support the creation and management of data products. What we are seeing is the rise of a “central enablement team”. This team is responsible for creating reusable assets to improve efficiency and governance across the domains.
User Defined Function (UDF)
A simple example of this is the creation of a User Defined Function (UDF) which creates a standardized customer key. If we have a single UDF that can be used by all domains, it means:
- Each domain does not need to create its own UDF, which improves the efficiency of the domains.
- We know that cross domain or cross data products will have consistent customer key data structures, which improves data governance efforts.
To set the UDF into motion, we can implement a ‘carrot approach’ — incentivizing good work with rewards — by making it easy for the domain to find and use the UDF, rather than creating something themselves. This is a far better approach than the ‘stick approach’ — using punishment to push people towards goals— of mandates and policies.
Factors to consider when aligning data skills to domains
From talking to CDOs that are adopting Data Mesh, there are a number of factors that we need to consider when moving the data skills to the domains.
- How do we use skilled data resources?
- Should the data skilled resources be ‘personal trainers’?
- Should they be there to mentor and educate the others in the domain to create and manage data products?
- Should we have ‘capability squads’ who actually perform the creation and management of the data products within and effectively on behalf of the domains?
- Should the data-skilled resources be “transformation teams” that come into the domain for a specific project or data product and then move onto something else?
The answers to these questions depend on the organization, including some of the factors listed below.
The move toward data citizenship
This is an approach where organizations aim to get members of the domains, who have a strong understanding of the business of the domain, to up-skill with data skills to become “data citizens” and then become further enabled by expert ‘personal trainers’.
There are a number of organizations that have adopted this approach. There are also a number of cultural aspects to consider with this approach as well as incentivizing members to become more effective with data. To support this move towards data citizenship requires data literacy.
Data Literacy is critical
Data literacy is a key requirement to enable domains to create and manage their own data before they can create and manage data products for consumption outside the domain. Data literacy is an important focus for CDOs, and getting all members of an organization to be more data literate is critical.
Without a level of data literacy the ‘personal trainer’ approach mentioned above won’t work. However, to some extent data literacy is only half of the problem, the other half is technology literacy, and when this is combined with domain knowledge, this is a lot to know; indeed Zhamak refers to this as “Cognitive Load.”
Technology reduces cognitive load
As new technology is adopted by an organization it naturally increases the need for broader technical literacy. If organizations need members of domains to understand a ever larger number of technologies just to create and manage data products, you have two choices:
- Increase the members in each domain so that you have the necessary skills available, or
- Adopt technology that abstracts away many of the technology concerns and allows domains to concentrate on data and the business knowledge related to the domain data. This is where Starburst can and has played a significant role in supporting many organization’s adoption of Data Mesh.
Reporting hierarchy
So far we have discussed individuals from the central data team being subsumed into the domains. However, there are a number of approaches that have been adopted by different organizations in terms of reporting structures to make that happen.
The next figure outlines 3 of these:
Reporting into domains
The first approach is that the data skilled resources become part of the domain and report into the domain owners reporting line. This benefits the domain because they have the necessary skills. But a drawback is that although data workers gain knowledge of the domain, they lose the ability to work cross functionally on different data problems.
Initiative-based
The second approach is based on the assumption that the central data team still exists, and skilled data workers move into the domain for a fixed period of time to complete a specific initiative, such as the initial creation of a data product. The drawback with this approach is that data products are long lived, and the essential skills to manage them are lost to the domain. That’s where the ‘personal trainer’ and ‘data citizenship’ approaches may become important, when the skilled data workers leave the domain. From a data worker perspective, this approach still retains the attractiveness of being able to work on data products across the organization.
Secondment(job rotation)-based
The third approach is a secondment (aka job rotation) approach, where skilled data workers are rotated through domains on a 6-12 month period to support data product development and management. This approach has many benefits, not least that the domains retain necessary skills for the long term. Meanwhile skilled data workers can learn new domain skills, while working on data problems across the enterprise.
All of the above considerations are really concerned with being efficient and effective in creating and managing data products. The goal is to enable members of the enterprise to make better data-driven decisions in a timely fashion, and so the timeline of data products is critical.
The timeline of data products
As organizations start their journey of building data products, consider the timeline it takes to develop it. Sure the design and testing of data products may take a while, but the actual creation should not take more than a week or so to develop. If it does, your organization might not have the correct technology suited for the job. Technology and data capabilities should enable individuals to produce data products quickly and iterate fast.