The data mesh approach to analytics at scale

Share

Get early access to free early release chapters (including the newly released chapter!) of the O’Reilly book, Data Mesh: Delivering Data-Driven Value at Scale, written by Zhamak Dehghani.

I’m excited to share more about Data Mesh, a popular new approach to analytics at­ scale. But first, I want to talk through how we got here. Over the past 30 years, companies have invested in analytics technologies around an ideal called the “single source of truth.” This was made famous by the enterprise data warehouse model, and it sounded great on paper. If you centralized all of your data in one place, you would finally be able to understand everything going on in your business. As more cost­ effective storage options came into the picture, like Hadoop and Object Storage, data lakes have become increasingly popular.

At Starburst, we love data lakes, and we think it is definitely the most efficient place to store data. However, the idea of centralizing all of your enterprises’ data in one place is still fraught with challenges, no matter what the underlying storage happens to be.

  • First, it’s slow. When you consider how long it takes to move the data, there are significant delays in time-to-insight.
  • Second, your data gets locked in as you’re forced to adopt proprietary formats or pay egress fees to move your data around.
  • Third, it’s incredibly expensive. You’re paying an expensive storage price while also paying your teams to constantly move and copy data, and maintain these pipelines.
  • And finally fourth,  it’s unachievable. It’s actually impossible to have all of your data in one location. It’s been clear to me for a long time that we need a better path to data management and analytics at ­scale.

The answer is Data Mesh. Coined by Zhamak Dehghani, Principal Technology Consultant at ThoughtWorks, Data Mesh is a more modern approach to managing analytics at ­scale. It embraces decentralization over centralization, meaning it allows companies to more efficiently access distributed data as a core architectural approach.

Data Mesh addresses many of the flaws in that monolithic data warehouse model. Data Mesh is really about thinking about organizational, architectural, and technological assumptions to get the best out of your data team and your data. Data Mesh is founded on four main principles:

  1. Domain ­driven data ownership, i.e., giving teams greater control over their data sets.
  2. Data as a product, i.e., creating an organizational muscle around how to best package your data for use within or outside your company.
  3. Creating a self ­service infrastructure platform, where the central IT team creates a data platform that can be utilized by different domains.
  4. The notion of federated computational governance model. An important aspect of Data Mesh is to ensure that distributed data remains secure and governed.

Data Mesh is not one type of technology or code that magically solves data problems at the touch of a button. Instead, it’s about rethinking the human side of technology, alongside adopting a more open approach to building data platforms at­ scale. At Starburst, Data Mesh architecture resonates with us because we’ve been focused on helping companies gain faster access to distributed data for years. Our products are powered by Trino, an open­-source distributed engine that can execute SQL queries against data stored in a range of databases and file systems.

We believe the Data Mesh architectural approach combined with Starburst’s leading query engine is a fundamentally better way to serve your organization’s analytics – today and tomorrow. If your enterprise is moving towards adopting a Data Mesh, we want to be there to help. Visit our Data Mesh Resource Center for more information.