How data mesh and a data fabric are fundamentally changing the role of the data engineer
Share
More deployment options
In the last year, Data Mesh has become one of the most talked-about topics in the data analytics space. As companies struggle to unlock the value in their distributed datasets and explore alternatives to centralized data management, both Data Mesh and Data Fabric are increasingly popular approaches. Even though they are often deployed to solve similar problems, the differences between the two solutions can be obscured. So I was thrilled to be able to moderate a discussion on this subject — between Zhamak Dehghani, the creator of the Data Mesh concept, and Dr. Daniel Abadi, a data management expert and professor of computer science at the University of Maryland — at Datanova, Starburst’s annual industry conference.
Until recently I was a research analyst at Gartner, where I ran the Data & Analytics agenda, so I have my own opinion on the subject. The way I understand it, Data Mesh is a techno-social business concept that demands change both at the infrastructure and organizational levels, whereas Data Fabric is more of a data integration pattern. In fact, Gartner has published a research report on the subject that can be downloaded here. Thankfully, the Datanova panel, which I’d encourage you to view on-demand here or below, clarifies the distinctions further.
A few highlights from the conversation include:
- Overviews of the Data Mesh and Data Fabric concepts
- What Data Mesh looks like in practice, including a sample use case
- How Data Fabric and Data Mesh can accelerate data discovery
- Discussion of technologies that could accelerate adoption
After Dehghani explained Data Mesh in detail, our conversation centered around the human side of these two new approaches to data architecture and what happens as we shift responsibilities away from central data management teams.
Mesh vs. Fabric: Eliminating the Bottleneck
Traditionally, when you want access to a new dataset, you have to appeal to a central data management team. Unfortunately, everyone else across the organization has to work through that same group, so a tremendous bottleneck results. You’re forced to wait in line for months or more.
As Dehghani explains, Data Mesh shifts ownership of datasets away from the centralized team to a new class of domain experts. These domain experts not only know the datasets well, they are responsible for transforming them into easily discoverable products that can be consumed within or even outside the organization.
The Data Fabric approach also takes certain data management responsibilities out of the hands of the central team, but it does so through technology and, specifically, the automation of tasks. For example, Dr. Abadi explains that Data Fabric would call for using AI and Machine Learning tools to automate discovery and recommend datasets to individual users — the sort of task now reserved for the already overworked central team. Data orchestration and governance tasks could be automated as well. “Data Fabric basically says let’s create a lot of metadata about the dataset and use that metadata to automate all the tasks we can and reduce reliance on the central team,” Dr. Abadi explains.
Another way to look at this, according to the panelists, is that Data Fabric deploys technology to move certain tasks away from the data management team. In a sense, it removes humans from the loop. Data mesh, on the other hand, is about using humans in a smarter way.
Where do we find Data Mesh engineers?
If we focus on Data Mesh, one of the potential problems is that data engineers are already in short supply, so I was interested to hear how our panelists thought enterprises could implement the Data Mesh architecture with this constraint in mind. How do we train or empower engineers with the skills to transform datasets into products? Do we move people from the centralized data engineering team into the domains? Are they ready for or open to this shift? As I’ve written many times before, technology is easy, processes are hard, but people are impossible. Yet both panelists offered encouraging solutions.
Dr. Abadi insists that learning the necessary skills will not be too difficult. “I teach basic data engineering classes at the university, and it’s one semester, and at the end of the semester, students know what they’re doing,” he notes. He believes enterprises will be able to train data engineers to build and maintain the domain-oriented Data Mesh.
The question then becomes whether they would accept or embrace this new role. Dehghani and Dr. Abadi both suggest that cultivating data products will prove to be more meaningful work, as it has the potential to connect a given engineer’s tasks with larger business outcomes. If you were to approach a domain team and ask them to create an ETL pipeline to move their dataset elsewhere, there is no intrinsic motivation. “They don’t see how the data are used,” Dehghani notes. “They don’t see the feedback. There’s no sense of purpose.”
On the other hand, Data Mesh empowers those domain owners to follow the progress and see the ultimate results of the work they did preparing and sharing the data product they oversee. Instead of pushing data through some pipeline to a group they don’t know, and never finding out how it was used, Data Mesh encourages both the domain owner and the business unit or data analysts accessing the data to work together towards a common goal. They’re no longer a bottleneck — they are part of a larger, more cohesive team, and they can see their role in a larger company story.
How are we going to build a Data Mesh?
The panelists agree that we can build the Data Mesh today, but we have to bring together a lot of technologies and tools in new ways, with quite a lot of custom work as well. Dehghani talks about how she hopes entrepreneurs and technologists will start doing some of the work to augment the technology and make this shift a little easier, and Dr. Abadi would like to see a focus in two areas initially. Right now, he says, we are limited in terms of data discovery. He’d like to see new technology that helps you find data relevant to your task and makes recommendations. He refers to a Starburst-like technology as a way to help build the mesh, without spelling it out.
His other suggestion relates to the human piece of this equation. He’d like to see technology that tracks data usage and effectively allows credit to go where credit is due, as a way of keeping everybody motivated in maintaining the data products along the paths to the end.
This post is only a brief summary of the discussion, so I encourage you to watch it yourself. This is a very interesting time in the data management and analytics space, and these two concepts deserve all the attention they’ve been receiving. As I noted above, the human piece of any organizational task is often the most challenging, but I found this discussion to be very encouraging. Both Data Mesh and Data Fabric are fundamentally changing the role of the data engineer for the better.