Debating the future of data engineering

  • Colleen Tartow, Ph.D.

    Colleen Tartow, Ph.D.

    Engineering

    Starburst

Share

Although a few months have passed, I’m still recovering from moderating a spirited Oxford-style debate on the future of data engineering at Datanova 2023

We asked bestselling author Joe Reis, who co-wrote Fundamentals of Data Engineering, and our own Andy Mott, Head of Partner Solutions at Starburst, to discuss whether the increasing popularity of decentralized architectures spells doom for data engineering as a profession. 

The occasional insults tossed around the stage kept the debate lively, but between the tongue-in-cheek banter, Joe and Andy shared some great thoughts about the future of data engineering.

We started out by discussing how to define the role of a data engineer within a large organization. 

Here there wasn’t too much debate, as the two sides generally agreed a data engineer is someone who is responsible for taking data from source systems, then preparing or transforming it in some way for downstream use. 

Broadly, Andy and Joe talked about how data engineers take data, add value to it, then make it available to others within the company to drive a decision that delivers a return on investment.

What the future looks like for data engineers gets interesting when you start to think about where in the organization they will operate going forward. 

Data engineers have typically functioned as a central hub for engineering tasks. They work with multiple departments and business units across the enterprise. But as the decentralized, data-product-driven architecture of the data mesh approach becomes more popular, and more organizations find themselves on this decentralization journey, what happens to that centralized data team?

Andy offered a really interesting take here. The shift to a data products approach, in which the domain experts who know their datasets best are charged with preparing this data for broader consumption within an organization, is still going to require data engineering. To achieve this, Andy outlined three potential models: 

  1. In one model, Andy sees data engineers being allocated to specific domains and taking charge of that particular set of data products. 
  2. A second variation would have data engineers being temporarily assigned to a data product team to help push it out the door, then returning to the central engineering group. 
  3. The third — and in Andy’s opinion, most appealing model — would have data engineers moving from one domain to the next for longer periods. This would allow them to broaden their skills and organizational knowledge and align more with the appeal of the original, centralized nature of the data engineering job, which allows them to interact with more aspects of the business. Plus, the organization would be less vulnerable to one individual engineer leaving and impacting the particular data product team to which they’d been assigned. 

Joe didn’t even interrupt him when Andy spelled out this vision, although he did feign boredom. Personally, I agree that moving data engineers into the domains makes the most sense and is the most efficient way to treat data as a product.

Upskilled generalist vs. centralized specialists 

The sociotechnical aspects of the future of data engineering are important, as they are a key tenet of the entire data mesh concept. Decentralized data strategies don’t mean we’ll do away with data engineers, however the role continues to evolve. Joe sees data engineers becoming more like upskilled generalists than specialists. 

As the three of us talked through the different possible paths, including how software might automate and simplify some of the processes around productizing data, it became clear that the role of the data engineer is only going to get more interesting. The future of data engineering in this decentralized world looks bright.

Granted, the discussion wasn’t entirely civil. Although they did converge on a few themes, Joe did draw an unflattering picture of Andy, and there was a reference to someone’s mother. Ultimately, the two experts didn’t resolve their differences at the end of the allotted time. We all agreed that the only way to settle the matter was to have the pair battle it out in the parking lot. 

Luckily, as a remote-first company, we don’t really have a parking lot, so Joe and Andy decided to amicably part ways and return to their respective corners of the data world.

Photo by: Colleen Tartow, Ph.D.