GenAI drives adoption of open data architecture

  • Justin Borgman

    Justin Borgman

    Co-Founder & CEO

    Starburst

  • Ethel Anderson

    Ethel Anderson

    Major Accounts Executive

    Starburst

Share

Generative AI (GenAI) continues to reshape industries. It has the potential to revolutionize employee productivity, drive innovation, and enhance operational efficiencies using data. 

To help drive this innovation forward, companies require a data stack capable of feeding GenAI models with high-quality data. To do this, exploratory data analysis is needed to discover that data. However, in many organizations, data exists in multiple locations, some on-premises and some in the cloud. 

To address this, an open approach to data architecture is needed that puts data discovery first. This strategy allows both on-prem and cloud workloads to feed the rising demand for AI, opening new possibilities. 

 

Case study: JP Morgan Chase and the rising adoption of GenAI

If there was any doubt about the rising appeal of AI, last Friday, CNBC announced a transformative initiative by JP Morgan Chase (JPMC) to create an “AI Powered Assistant” leveraging GenAI directly. The technology is based on OpenAI’s GPT model and will be used by 60,000 employees across the organization. 

To do this, it will use an internal corpus of data to feed the AI. The insights gained through this model will help streamline processes that would otherwise require significant manual intervention. 

It is a powerful example of a company leveraging AI in its own way to solve real problems. 

GenAI reaching regulated industries

JPMC’s announcement symbolizes an important turning point in the acceptance of GenAI as a tool to help automate processes and enhance employee productivity. 

It is particularly interesting when considering the highly-regulated banking environment, a sector not always known for rapid technological adoption. Taken on the whole, GenAI’s adoption in a generally conservative industry bodes well for its adoption in other industries. 

If JPMC is doing it, you can bet others are on the journey as well. 

GenAI driving team efficiency 

It’s worth remembering why this approach works. JPMC began this initiative because it promises to help them achieve their organizational goals. Specifically, they hope to improve employee productivity in research, enhance content summation, and facilitate hyper-personalized customer engagement and behavioral analysis. This makes it clear that OpenAI and other LLMs are already solving real problems by harnessing the value of internal data assets in exciting new ways. 

But all of this relies on secure and untethered access to relevant data. 

To keep driving this forward, access to all relevant data distributed across the enterprise is needed to feed AI models. The gold rush has begun to find a data stack that can serve the AI models that businesses increasingly need. 

 

GenAI benefits from strong data discovery

One of the most important things to consider is an AI data architecture capable of data discovery. Without a data discovery strategy, AI cannot return its valuable insights. Starburst lets you discover, train, and deploy AI models using the best data available from across your data ecosystem. This allows for exploratory data analysis, and is true for all sizes of data projects, including: 

  • A small corpus of data exposed to a finite audience of employees;
  • A broad dataset exposed to the entire organization for horizontal operational efficiency.

In all cases, having open, discoverable data benefits GenAI by unlocking the total value of an organization’s data.

Starburst helps build your GenAI data pipeline

Starburst provides an Open Data Lakehouse known for providing fast, scalable, and discoverable access to distributed data. At its core, it is powered by Trino, the leading MPP SQL query engine. With our end-to-end data analytics platform, we believe in granting our customers optionality and the ability to own their data the way they want. 

Open data stack benefits GenAI

At Starburst, we believe strongly in open standards and open architecture. In particular, the combination of the open-source Trino engine running on the open-source Apache Iceberg, Delta Lake, or Apache Hudi table format allows organizations to escape the traditional Enterprise Data Warehouse (EDW) model. Using an open data stack avoids the single source of truth that often becomes unsustainable, costly, and confining as data grows

As your organization pivots into a data-first strategy for AI, using an open data stack with strong data discovery helps to drive growth. As you scale to support more sophisticated LLMs and GenAI initiatives, data discovery becomes even more important. 

Starburst helps serve all of these goals. 

 

3 steps you should take when adopting an open data architecture for GenAI

Consider a data-first strategy across your value chain, including these three best practices.

1) Augment your EDW ecosystem with an open data lakehouse 

Enterprise Data Warehouses are not designed for AI. Consider augmenting your conventional EDW architecture with an open data lakehouse. This approach eliminates the need to move all of your data into a ‘single source of truth’.

In practice, this means adopting the lakehouse based on the cloud object storage of your choice, whether AWS S3, Google GCS, or Azure Blob. It also means using powerful on-prem solutions incorporating modernized object storage like Dell ECS or MinIO. This hybrid approach allows you to combine the scalability of a data lake with the structured query capabilities of EDWs to manage the volume, variety, and variability of data as it scales. 

2) Accelerate GenAI insights by enabling data discovery 

Data discovery makes the right data available to AI models. This point is particularly relevant as you scale your GenAI and AI-analytical workloads across various lines of business in your value chain. Companies are quickly realizing that not all data is readily available in a single, central EDW. Often, data cannot ‘sit in the cloud’ for regulatory, compliance, or economic reasons, and must instead remain on-prem or in various databases. 

In each of these cases, Starburst offers a solution. Whether it’s due to security protocols, PII, privacy, or data sovereignty needs, more companies are considering repatriating certain datasets back on-prem and adopting an open data lakehouse for their analytical workload needs. In such cases, Starburst allows you to access data on these systems, and in the cloud, allowing the best of both worlds.

3) Adopting a “multi-everything” approach to data

Adopting a multi-format, multi-cloud data architecture has many advantages. As this paradigm shift occurs and more data variability surfaces, an open standards approach enables your organization to improve flexibility and adopt a data accessibility strategy. To avoid vendor-lock in during this journey, we strongly encourage the adoption of an Apache Iceberg as your table format of choice. This allows you to benefit from full control over your data while maintaining rigorous data access and management controls

Apache Iceberg is fast becoming the de facto open standard for this very reason. In choosing Iceberg, you enable a model where YOU “own your data”. More importantly, you “own options” for that data in the form of compute, storage, scaling, and discovery optimization. 

The best way to leverage this openness is by using Starburst. This allows you to run “your compute” and leverage “your storage” for analytical workloads on top of Apache Iceberg while enabling the best market cost and performance to support your organizational growth needs.