The difference between a data mesh and data warehouse
Share
More deployment options
Let’s understand the distinctions between a data mesh and data warehouse, and how Starburst can help.
Data mesh vs data warehouse
Data Mesh is a decentralized, distributed approach to enterprise data management. More specifically, Zhamak Dehghani defines Data Mesh as “a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments – within or across organizations.”
In contrast, a data warehouse or EDW was supposed to deliver analytics at scale. Typically implemented on-premises, it operates with a fixed schema, rendering it less flexible, especially when dealing with changes in data structure. This means that when we want to load new data or change attributes for a data type, the changes need to be coordinated through updates to the central catalog. Moreover, organizations using a data warehouse may experience rising costs and vendor lock-in.
Related reading: Data lake vs data mesh
Data mesh architecture
To implement a Data Mesh architecture organizations can follow a structured approach to leverage existing investments and empower teams. Here’s a breakdown of the process:
1. Connect to Data Sources
The initial step in the Data Mesh journey is to establish connectivity with diverse data sources. Unlike the traditional approach of centralizing all data into a single source of truth, Data Mesh encourages connecting to data where it resides. Starburst facilitates this by providing over 50 connectors, allowing organizations to link to various data sources, be it in lakes or warehouses, on the cloud or on-premise, and in structured warehouses or non-structured lakes.
2. Create Logical Domains
After establishing connectivity across different data sets, the focus shifts to creating interfaces for business and analytics teams. In Data Mesh terminology, this is referred to as creating logical domains. The term “logical” signifies that the data isn’t moved into a centralized repository. Instead, a semantic layer is established, enabling users to access data through dashboards. This approach promotes self-service, allowing data consumers to independently access the data they need. All required data is organized within logical domains, alongside autonomous domain teams.
3. Enable Teams to Create Data Products
Once domain teams have access to the necessary data, the next step is to empower them to convert domain data into valuable data products. These data products can be organized into a library or catalog, fostering collaboration and sharing within the organization.
Data Warehouse Architecture
The data warehouse was traditionally designed to enable and support an organization with business intelligence through analytics, reports and dashboards. Often associated with an organization’s “single source of truth,” a data warehouse’s analytical capabilities fortified an organization to make valuable operational, tactical, and strategic business decisions.
The characteristics of a data warehouse architecture have largely remained the same and can be described in the following way:
- Data is extracted from operational databases
- Data is transformed into a universal schema
- Data is loaded into the warehouse tables
- Data is accessed through SQL–like querying operations
- Data primarily serves data analysts to produce reports and visualizations
Essentially, organizations attempted to build the “enterprise data warehouse.” However, coming to a consensus on the definition of terms across a wide portfolio of use cases, along with relying on a centralized team responsible for the creation, management, and retirement of thousands of ETL jobs, tables, and reports meant that over time, organizations moved from a single enterprise data warehouse target to many data warehouses, each focused on supporting a specific part of the business.
Unfortunately, this resulted with the scenario where there were now multiple definitions of data entities, but still a centralized team responsible for the creation, management, and retirement of thousands of ETL jobs, tables, and reports across a number of sometimes differing data warehouse technologies. One of the biggest issues with this approach is that a business function would request to change to a table, job, or report and then wait weeks or even months for the central team to respond. Inevitably, this resulted in missed revenue opportunities for the business, increased cost or poorer risk control.
How Starburst helps with your data mesh and data warehouse journey
“One of the core missions of my team is to make the data mesh happen while still maintaining everything that we need to maintain in terms of policies and data privacy constraints. Starburst is making my life a lot easier by creating the first mesh platform for business metrics, that we can start operating within.” — Alexander Seeholzer, Director, Data Services, Sophia Genetics
“Starburst plays a key role in our Data Mesh strategy. It allows us to not only better integrate and adjust the governance model, but also catalog and understand data access and usage patterns.” – Patrice Linel, Senior Manager Data Science & Data Engineering, GENUS
“The data warehouse and data lake space is constantly evolving, and our enterprise focus means we have to support customer requirements across different platforms. Starburst gives us the ability to move quickly to support ever-changing use cases within complex enterprise environments.” — David Schulman, Head of Partner Marketing, Domino Data Lab
“We evaluated Snowflake, but given the incredibly ad-hoc nature of our business it wouldn’t be cost effective. We would have to increase our cost by 10X to achieve the performance that Starburst offers us at a fraction of the cost.” — Richard Teachout, CTO, El Toro