Comcast’s data journey into data mesh with Starburst

  • Bryan Aller

    Bryan Aller

    Director, Software Development & Engineering

    Comcast

Share

At Comcast, after a long data journey, we first implemented data virtualization. Now, we are on our way to build a data mesh using Starburst along with a host of self-service frameworks and tooling. Comcast has been one of the first to talk about these new technology innovations. However, when data mesh came around, we quickly realized how it could move and shake almost any industry. 

Recently, I participated in a podcast with Justin Borgman, CEO of Starburst, and Teresa Tung, Cloud First Chief Technologist at Accenture, to discuss the future of cloud in the world of data virtualization and data mesh. 

 

Our rough working definition of data mesh at Comcast is a collection of data platform components, nodes that are providing interoperable services in the form of storage, computation, transformation, egress, and most importantly, the culture to support it. 

We believe the cultural piece is paramount. Unlike many previous technology trends, data mesh is more than just a technical solution or tool. It’s a fundamental shift in the way the enterprise thinks about data, products, and consumers. 

Data mesh has the ability to unite a variety of independent business units under a common governance model without threatening their engineering sovereignty. 

Comcast’s data journey

Our data journey at Comcast has been a long one. We have traditionally been very much at the forefront of adopting new technologies as they launch. Over the years, we’ve seen an evolution using everything from data marts, data warehouses, and data lakes. We’ve separated out storage and compute; we’ve virtualized different solutions; we’ve containerized applications. 

The challenge has been that with each of these evolutions, we found ourselves with expensive cloud migrations and legacy solutions. This meant that we never fully migrated from one solution to the next, because there were valid reasons why those platforms were put into play.

In the past, a lot of our development and interoperability included the use of ETL or data pipelines to help centralize data. As time went on, new formats made that more and more difficult. For example, structured data to SQL, going from block storage to object storage. This resulted in a situation where we had a long tail of legacy data. The cost of moving wholesale from one to the next would be quite expensive.

Looking at what we’ve learned from the big data space, containerization, and how that helped with cloud migrations, we’re now operating in a hybrid environment. The exciting part about looking at data fabric and data mesh technologies is that it seeks to drive harmony. 

It allows us to tap into all of these different data stores, levels the playing field, and enables interoperability between these different solutions, even if the underlying data storage formats are different and sit in different places.

The benefits of data virtualization

Beyond data virtualization being the best-of-breed, there are valuable benefits to both privacy and security. With GDPR and the California Consumer Privacy Act, there’s more of a need now to know where the data lives and to be able to govern it in all of those locations and how people access it. 

When we look at virtualization technology, it creates another layer in which we are able to control the end user experience. We can enact data access policies to show that even if data lives in different places across the enterprise, it can be accessed and secured in a uniform way. 

That in itself was a huge win because in prior years, governance had to happen on each and every individual platform, and there were many different ways to apply governance. Now the standardized tool sets that data mesh brings to the table are simplifying that experience. We’re able to implement policies in one place.

By adding data virtualization right away, you can unlock your data products from your existing systems. While that’s happening, we have a common front door that is seamless to our customers as we’re upgrading and modernizing the systems underneath, so that our users get the benefit of increased performance, the value of new technologies, with no visible change in the data products they’re working with today.

User enablement with data virtualization

The key to being successful with a solution like this is to really enable your users. In the past, moving from one solution to another would cause end users to complain. They’d have to go through lengthy training, and there’d be a learning curve.

We began hosting a variety of user group sessions internally to educate users on a query fabric and to talk about mesh principles. We partnered with Starburst and brought them in for instructor-led education. Starburst provided our end users with materials, and hosted workshops to teach them new skills. In a lot of cases, our end users have been able to change a connection string and use the tools that they use today, whether it’s a reporting tool, a dashboard tool, a spreadsheet tool, and keep moving. For them, the experience has been relatively seamless, and that’s the best thing we can do for end user enablement.

In cases where there’s been more of a challenge, we were able to document job aids and different reference architectures or sample code that makes it easier for them to adopt. So far, that adoption has lowered the barrier to entry to a point where users who are less technical, even users who are strictly in management with no technical background are able to jump in and leverage data in the big data ecosystem.

At the end of the day, it’s about the data products to be consumed by the end users. Data mesh makes it super simple for folks to go into the interface and begin consuming data.

A case for data virtualization

When we decided to adopt data virtualization, there was some convincing that needed to happen, but the story spoke for itself. It sounds almost too good to be true. 

When you mention that you can bring in a solution that promotes value across everything you’ve done to date, future proofs you, and offers performance gains, it seems like a lot. But being able to bring that to the table with quantifiable examples definitely helped sell the case. 

There were the obvious privacy and security benefits. There were huge performance gains, especially with big data. We were seeing speeds 10 to 20 times faster on complex SQL queries, which is nothing to joke about. We’ve demonstrated that we can take a best-of-breed approach and leverage our existing data stores while joining them against new ones. 

By putting live demos in front of senior leadership, it was easy to make the case that this really should be the broader data strategy. We are not starting over. It’s not a costly migration. 

We’re able to avoid and defer these migrations, and we can prolong the value of our current solutions in conjunction with supporting a forward-facing solution that supports our competitive edge. As new technologies come along, we can bake them into this model. It sets us up to harmonize our overall ecosystem. 

Last, but far from least, it sets up a shared context for identity authentication, authorization, and data access policies. It simplifies the ecosystem. It makes for a better customer experience, and it sets you up in a position where you can make change today. You can take advantage of these improvements immediately.

For more information about Data Mesh, listen to the full Accenture podcast:  Spotify | Apple | Stitcher

Learn more about Comcast's data journey with Starburst

Extract insights from data no matter where it resides

Read about a better way