Data Mesh

The idea is that each domain-specific dataset has its own embedded engineers and product owners to manage that data and its availability to other teams, driving a level of data ownership and responsibility, which is often lacking in the current data platforms that are largely centralized, monolithic, and often built around complex pipelines.

Data Mesh is a strategic approach to strengthen an organization’s digital transformation journey as it centers on serving up valuable and secure data products. Data Mesh evolves beyond the traditional, monolithic, and centralized data management methods of utilizing data warehouses and data lakes.

Data Mesh improves organizational agility by empowering data producers and data consumers with the accessibility to access and manage big data, without the trouble of delegating to the data lake or data warehouse team. A solution for data silos and data integration, data mesh allocates data ownership to domain-oriented groups or business units that serve, own, and manage data as a product. All of which improves data-driven decision-making for data leaders.

Free Data Mesh books

What are the 4 principles of data mesh?

Core Principles of Data Mesh

#1 Domain-oriented data ownership and architecture

To understand what domain-driven data is, we must know what a domain is. A domain is an aggregation of people organized around a common functional business purpose.

 Data Mesh proposes that domain ownership is responsible for management of the data, metadata, policies and created by the business function of the domain. The domains are responsible for the assimilation, transformation, and provision of data to the end-users. Eventually, the domain exposes its data as data products, whose entire lifecycle is owned by that domain.

#2 Data as a product

Data products are produced by the domain and consumed by downstream domains or  users to create business value. Data products are different from traditional data marts, as they are self-contained, and are in themselves responsible for aspects such as security, provenance and infrastructure concerns related to ensuring that the data is kept up to date. Data products enable a clear line of ownership and responsibility and can be consumed by other data products or by end consumers directly to support business intelligence and machine learning activities.

Related blogs:

Related webinar: Empowering modern analytics strategies with data products

Related whitepaper: A guide to data products: creating and managing reusable data assets

#3 Self-serve data platform

The concept of a self-serve data infrastructure is that it is made up of numerous capabilities that can be easily used by members of the domains to create and manage their data products. The self-serve data platform is supported by an infrastructure engineering team, whose primary concern is the management and operation of the various technologies in use. This illustrates the separation of concerns, domains are concerned with data and the self-serve data platform team is concerned with technology. The measure of success of the self-serve data platform is the autonomy of the domains.

#4 Federated computational governance

Traditional data governance and access controls can be seen as an inhibitor to producing value through data. Data Mesh enables a different approach by embedding governance concerns into the workflow of the domains. There are numerous aspects to data governance, however when considering Data Mesh, it is imperative that usage metrics and reporting become part of this definition. Data sharing, usage and how that data is being used are key data points to understanding the value and hence success of individual data products.

What are the benefits of a data mesh?

The implementation of Data Mesh promotes organizational agility for organizations who want to thrive in an uncertain economic climate. All organizations need to be able to respond to changes in their environment with a low-cost, high reward approach. Introducing new data sources, needing to comply with changing regulatory requirements or meeting new analytics requirements are all drivers that will precipitate changes to an organization’s data management activities. Current data management approaches are typically based on complex and heavily integrated data pipelines (ETL, ELT) and data ingestion between operational and analytical data systems struggling to change in time to support the business needs in a timely fashion in the face of these drivers. The purpose of Data Mesh is to provide a more resilient approach with respect to data to efficiently respond to these changes.

Related reading: 10 benefits and challenges of data mesh

Why is data mesh a good thing? Data Mesh is a socio-technical approach involving people, process, and technology

Data Mesh is a ‘socio-technical’ approach that requires changes to the organization across all three dimensions of people, process and technology. Organizations that adopt Data Mesh may spend 70% of their efforts on people and processes and 30% on the technology to enable the future Data Mesh state.

People: From central data team to the decentralization of business domains

Embarking on a Data Mesh journey will result in significant organizational changes and adjustments to employees’ roles. Existing workers will be critical to the success of adopting a Data Mesh, as they have invaluable tacit knowledge to contribute to the Data Mesh journey. Therefore, the transition of data ownership from a central data team to decentralized domain-driven design should be approached as well as a realignment of existing data-focused employees. There are also changes to management hierarchies and also reward mechanisms.

Process: Changes within the organization

To promote a sustainable and agile data architecture, implementing Data Mesh will require process changes within the organization. If we consider data governance, new processes around data policy definition, implementation and enforcement will be required which will impact the process of accessing and managing data, as well as the processes pertaining to exploiting that data as part of business-as-usual(BAU) business processes.

Technology capabilities to implement and operate a distributed data mesh

Technology capabilities are a key enabler to implement and operate a Data Mesh. New technology is likely to be required for a number of reasons:

  • Reduce the friction of exploiting across technologies, and interoperability of those new technologies is likely to be critical.
  • Enable domains to be self-sufficient and focus on their first-class concern which is data rather than technology.
  • Enable new data platforms to be bought online and the data that they expose exploited in a seamless manner
  • Enable automatic reporting of governance aspects across the data mesh, such as data product usage, compliance with standards and data product feedback.

Whether you should adopt data mesh: Advice from Zhamak Dehghani, the founder of the Data Mesh paradigm and former ​​director of technology at ThoughtWorks

The truth is that Data Mesh may not be the correct fit for every organization. Data Mesh is primarily aimed at larger organizations that encounter uncertainty and change in their operations and environment. If your organization is small with respect to its data needs and those data needs don’t change over time, then Data Mesh is probably an unnecessary overhead.

Related reading:

Learn all about strategy, implementation, and execution of Data Mesh first hand from Zhamak Dehghani.

Data lake vs data mesh

The data lake is a technology approach, whose main objective has traditionally been as a single repository to move data to in as simple a manner as possible, where the central team is responsible for managing it. 

Sure, data lakes provide significant business value with raw, and open file formats and reduce storage costs. They also suffer from a number of concerns with the primary issue is that once data is moved to the lake, it loses context. For example, we may have many files containing a definition of customer, one from a logistics system, one from payments and one from marketing, which one is correct for real-time data analysis? 

Furthermore data in the data lake will not have been pre-processed, so data issues will inevitably arise. The data consumer will then typically have to liaise with the data lake team to understand and resolve data issues, which becomes a significant bottleneck to using the data to answer the initial business question.

In comparison Data Mesh is more than just technology, Data Mesh combines both technology and organizational aspects including the idea of data ownership, data quality and autonomy. So consumers of data have a clear line of sight around data quality and data ownership and data issues can be discovered and resolved much more efficiently. 

Ultimately data can be used and trusted.

Related reading: Data Mesh vs Lake vs Warehouse vs Fabric

Data fabric vs data mesh

The difference between a data fabric and data mesh is that data fabric is a technological approach and that data mesh is about organization, people, and technology.

Data fabric concentrates on a collection of various technological capabilities that collaborate to produce an interface for the end-users that consume data. Many of the supporters of data fabric espouse automation through technologies like ML of many of the data management tasks to enable end users to access data in a simpler way. For simple data usage there is some value in this, however for more complex situations or where business knowledge needs to be integrated into the data then the limitations of Data fabric will become apparent.

Arguably a Data fabric could be used as part of a Data Mesh self-serve platform, where data fabric exposes data to the domains who can then embed their business knowledge into a resulting data product.

As Darnell-Kanal Professor of Computer Science, University of Maryland at College Park Daniel Abadi says the difference between a Data fabric and Data Mesh is not obvious. He advises, “Ultimately, an optimal solution will likely take the best ideas from each of these approaches.”

Related reading:

 

What does the Data Mesh look like?

Data Mesh Architecture: How to integrate data mesh with your ecosystem

Organizations that are ready to implement Data Mesh will need help connecting their data sources for a quick win with Starburst. Below we highlight how:

#1 Connect to data sources where it resides

As you begin your Data Mesh journey the first step is to connect to data sources. A key Data Mesh implementation principle is to connect your enterprise data by leveraging your existing investments: lakes or warehouses; cloud or on­-premise; structured warehouse or a non-structured lake. Unlike the single-source-of-truth approach to centralize all your data first, you’re leveraging and querying the data where it resides. It is the first Data Mesh win for many Starburst customers as our 40+ connectors enable the ability to connect to data sources.

#2 Create logical domains

After generating connectivity across all the various data sets, the next goal is to create an interface for business and analytics teams to find their data. In data mesh terms, we call that a logical domain. It’s called logical, because we’re not moving data into a repository where data consumers can access it. Rather, we’re creating a logical place where they can log into a dashboard as a semantic layer, to see the data that’s been made available to them.

All the data you need resides in your domain alongside domain teams that are empowered to work autonomously. In essence, we’re promoting the concept of self-service where data consumers are empowered to independently do more on their own.

#3 Enable teams to create data products

When you provide a domain team access to the data they need, the next step is to teach them how to convert domain data into data products. Then, with a data product, create a library or a catalog of data products that you can share

Starburst has a built-in data catalog that enables you to very quickly search, discover, and identify data products that might be of interest and improve the lives of data scientists and data engineers

Creating data products is a powerful capability as you’ve enabled your data consumers to very quickly move from discovery to ideation as well as to insight, because we’re quickly creating and then using data products across the organization.

How to build and manage a data mesh approach

Those who are eager to get started or just getting started on their Data Mesh journey for democratization and scalability will find the 90-Day Data Mesh Pathfinder helpful. In fact, many enlist a Pathfinder to help them with this ambitious endeavor. With the right strategy, it is not labor-intensive and there is a low cost, low risk and high reward exercise.

Start by designing and building your data mesh pathfinder

The purpose of a pathfinder is an exercise on how Data Mesh will fit into your organization from a  technology, people, and process perspective. You’ll also identify your strengths and weaknesses so that when you’re ready to begin your Data Mesh transformation program, we can curate all the learnings from the Pathfinder to accelerate in the areas where you can move quickly, and slow down in the areas where you need remedial work.

Key activities in your data mesh pathfinder workshop

  1. Select the Pathfinder use case and agree on scope (e.g. 1 Domain, 6 Data Products, 3 Data Sources)
  2. Establish Pre-MVP environment for early design and enablement activities
  3. Data product design, refinement, and consumption
  4. Domain Owner, Data Product Owner, and Consumer Enablement Training
  5. Showcase the MVP
  6. Integrate the Data Mesh into your Data Strategy

Related reading: The Data Mesh Pathfinder eBook

Related workshop: 2 hour Data Mesh Pathfinder workshop

Why decentralized data access and Data Mesh are the future of analytics at scale

Richard Jarvis, CTO, EMIS Group

“Data Mesh is certainly the future for our business, and probably for many others, particularly ones which have a legacy of acquisitions, and the need for merging of different data sets to form a new larger entity. Having the ability to query data where it resides using Starburst is enormously powerful and makes a huge impact on the ability for data to provide answers.”  Read more

Ritesh Ranjan, Lead Data Architect, Sky

“Decentralized access is definitely the future…We are currently in the process of creating data products, which Starburst is really helping with. Previously, without a single point of secure data access, creating a data product was not possible. With the abstraction layer that Starburst provides across different data sources, it has become our analytics engine for the Data Mesh.” Read more