Despite the investments and effort poured into next-generation data storage systems, data warehouses and data lakes have failed to provide data engineers, data analysts, and data leaders trustworthy and agile business insights to make intelligent business decisions. The answer is Data Mesh – a decentralized, distributed approach to enterprise data management.
Founder of Data Mesh Zhamak Dehghani defines Data Mesh as “a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments – within or across organizations.” She’s authoring an O’Reilly book, Data Mesh: Delivering Data-Driven Value at Scale and Starburst, the ‘Analytics Engine for Data Mesh,’ happens to be the sole sponsor. In addition to providing a complimentary copy of the book, we’re also sharing chapter summaries so we can read along and educate our readers about this (r)evolutionary paradigm. Enjoy Chapter Two: After the Inflection Point!
As we highlighted in our first post, if we continue along the same path in our data management strategy, businesses will plateau. Afterall, businesses are complex organizations as well as intricately connected. With the way our economic systems are structured, we anticipate businesses to continue growing, merging, and acquiring. Naturally, we’ll have new sources of data to manage, increasing the volume of data as well as the velocity of data, not to mention new data-driven use cases. Add a dash of uncertainty with a pandemic and some volatility – the only path forward is to remain agile, embrace change, and respond gracefully.
To stay relevant and get ahead of competition, we need to respond to change quickly. We simply cannot ignore Data Mesh, a strategic methodology that has emerged to help us manage and access analytical data, at scale. Once adopted, here are a few immediate benefits:
Align the Business Domain With Technology And Data
We’ve learned that in order to accelerate the value we’re getting from data in proportion to the investments made, we have to remove the centralized bottleneck (data warehouses and data lakes) that’s preventing us from responding to change quickly. Data Mesh introduces “a peer-to-peer approach in data collaboration when serving and consuming data. The architecture enables consumers to directly discover and use the data right from the source.”
With peer-to-peer analytical data sharing, organizations can orient their “technology staff around their business domains, allowing each business unit to be supported by a dedicated technology capability for that unit’s work.” This way, data will have owners from domains who are most familiar with the source data and are “best able to understand what analytic data exists, and how it should best be interpreted.” Ultimately, “each business unit takes on the responsibility for analytic data ownership and management.”
Close the Gap Between Analytical and Operational Data
To make business-enhancing decisions, analytical data must be accurate. This means it must be near real-time at the moment the business decision is being made. With analytical and operational data on separate planes and connected through fragile data pipelines, this ideal is only a pipe dream. Once we dissolve data pipelines, we forge a new way of providing up-to-date and trustworthy analytical data to data analysts and scientists.
With Zhamak’s approach, “Data Mesh connects these two planes under a different structure – an inverted model and topology based on domains and not technology stack – where each domain extends its responsibilities to not only provide operational capabilities but also serve and share analytical data as a product.”
Zhamak envisions a future where technology will bring these two planes even closer together, but for now, Data Mesh’s focus is on the analytical plane and integrating with the operational plane.
Reduce Accidental Complexity of Pipelines and Copying Data
Even though copying is the sincerest form of flattery, it’s not the case when it comes to data. And we tend to overdo it. Zhamak writes, “Today, we keep copying data around because we need the data for yet another mode of access, or yet another model of computation. We copy data from operational systems to a data lake for data scientists. We copy the data again into lakeshore marts for data analyst access and then into the downstream dashboard or reporting databases for the last mile. We build complex and brittle pipelines to do the copying. The copying journey continues across one technology stack to another and across one cloud vendor to another. Today, to run analytical workloads you need to decide upfront which cloud provider copies all of your data in its lake or warehouse before you can get value from it.”
Data Mesh retreats from copying data and shifts the focus on capturing data insights more quickly. Data is no longer perceived as an asset, but a product. Data Mesh’s self-serve infrastructure can provide anyone with proper access control, discover and use the data product no matter where the data physically resides. Moreover, Data Mesh measures success not from the volume of the data, but the happiness and satisfaction of the data users. Data Mesh eliminates the need to require a data engineer at every domain and instead enables generalists to develop, discover, and deliver data products.
What will this look like in an organization? Zhamak highlights, “A machine learning training function or a report, can directly access independent data products, without the intervention of a centralized architectural component such as a lake or a warehouse, and without the need for an intermediary data (pipeline) team.”
Before the Inflection Point: Current Landscape Of Data Architecture
Before we move too far ahead into Data Mesh, in our next post, we’ll take a closer look at the fragile state of our data architecture and why it won’t sustain data-driven organizations in the near future.
Read along with us!
Get your complimentary access to pre-release chapters from the O’Reilly book, Data Mesh: Delivering Data-Driven Value at Scale, authored by Zhamak Dehghani now.