×

AI drives adoption of open data architecture

Published: August 13, 2024

Generative AI (GenAI) continues to reshape industries. With this comes the potential to revolutionize employee productivity, drive innovation, and enhance operational efficiencies using data. 

To help drive this innovation forward, companies require a data stack capable of feeding GenAI models with high-quality data. However, in many organizations, data exists in multiple locations, some on-premises and some in the cloud. 

To address this, an open approach to data architecture is needed. This strategy allows both on-prem and cloud workloads to feed the rising demand for AI, opening new possibilities. 

Case study: JP Morgan and rising adoption of AI

If there was any doubt about the rising appeal of AI, last Friday, CNBC announced a transformative initiative by JP Morgan to create an “AI Powered Assistant” leveraging GenAI directly. The technology is based on OpenAI’s GPT model and will be used by 60,000 employees across the organization. 

To do this, it will use an internal corpus of data to feed the AI. The insights gained through this model will help streamline processes that would otherwise require significant manual intervention. 

It is a powerful example of a company leveraging AI in its own way to solve real problems. 

AI reaching regulated industries

In certain ways, JPM’s announcement symbolizes an important turning point in the acceptance of GenAI to help automate processes and enhance employee productivity. 

It is particularly interesting when considering the highly-regulated banking environment, a sector not always known for rapid technological adoption. Taken on the whole, GenAI’s adoption in a generally conservative industry bodes well for its adoption in other industries. 

If JPM is doing it, others will soon follow. 

AI driving team efficiency 

It’s worth remembering why this approach works. JPM has begun this initiative because it promises to help them achieve their goals as an organization, driving employee productivity in research, content summation, hyper-personalized customer engagement and customer behavioral analysis is only set to grow. This makes it clear that OpenAI/LLMs are already solving real problems by harnessing the value of internal data assets in exciting new ways. 

But all of this relies on access to data. 

To keep driving this forward, access to the data that feeds AI models will become increasingly important. The gold rush is on to find a data stack that can serve the AI models that businesses increasingly need. 

AI benefits from an open data architecture 

One of the most important things to consider is an AI data architecture capable of scaling. Without data built at scale, AI cannot return its valuable insights. This is true for all sizes of data projects, including: 

  • A small corpus of data exposed to a finite audience of employees;
  • A broad dataset exposed to the entire organization for horizontal operational efficiency.

In all cases, an open, scalable data architecture benefits GenAI by unlocking the total value of an organization’s data.

Starburst helps build your AI data pipeline

Starburst is a leading end-to-end data analytics platform known for providing fast, scalable, and flexible access to distributed data. We believe in granting our customers data optionality and the ability to own their data the way they want. 

Towards an open data lakehouse architecture

In practice, this is achieved through an open data architecture. Starburst empowers our customers to choose compute and storage resources based on their needs. 

Many organizations, particularly those operating in highly-regulated fields, will require the flexibility and scale to run GenAI and AI workloads across on-prem, multi-cloud, and multi-modal ecosystems. This approach requires significant “horsepower,” including GPUs, CPUs and extensive model training. There is also a need for fine-grain security controls and the means to democratize analytical workloads that span various distributed datasets. Collectively, this allows valuable data inputs to reach their respective lines, thereby achieving their business needs. 

Open data stack benefits AI

At Starburst, we believe strongly in open standards and open architecture. In particular, the combination of the open-source Trino engine running on the open-source Apache Iceberg table format allows organizations to escape the traditional Enterprise Data Warehouse (EDW) model. Using an open data stack avoids the single source of truth that often becomes unsustainable as data grows

As your organization pivots into a data-first strategy, using an open data stack helps to drive growth. As you scale to support more sophisticated LLMs and GenAI initiatives, these considerations become even more important. Growing AI initiatives require extensive compute, GPUs and all the V’s–volume of data, velocity of data and variability of data types. 

Starburst helps serve all of these goals. 

3 steps you should take when adopting an open data architecture for AI

Consider a data-first strategy across your value chain, including these three best practices.

1) Augment your EDW ecosystem with an open data lakehouse 

Consider augmenting your conventional Enterprise Data Warehouse model with an open data lakehouse. This approach eliminates the need to move all of your data into a ‘single source of truth’.

In practice, this means adopting the lakehouse of your choice, whether AWS S3, Google GCS, Azure Blob, in combination with modernized object storage like Dell ECS or MinIO. This approach allows you to combine the scalability of a data lake with the structured query capabilities of EDWs to manage the volume, variety, and variability of data as it scales. This augmented strategy also pivots your organization to a more decentralized approach similar to that found in data mesh architectures. This strategy allows you to designate ‘data domain experts’ to manage and curate specific lines of business ‘datasets’. This democratizes data access, yielding many organizational benefits. 

2) Accelerate GenAI insights by enabling data accessibility 

This point is particularly relevant as you scale your GenAI and AI-analytical workloads across various lines of business in your value chain. Companies are quickly realizing that not all data is readily available in a single, central EDW. Often, data cannot ‘sit in the cloud’ for regulatory, compliance, or economic reasons, and must instead remain on-prem or in various databases. In each of these cases, Starburst offers a solution. Whether it’s due to security protocols, PII, privacy or data sovereignty needs, more companies are considering repatriating certain datasets back on-prem and adopting an open data lakehouse for their analytical workload needs. 

As this paradigm shift occurs and more data variability surfaces, an open standards approach enables your organization more flexibility to pivot, remain operationally efficient and cost conscious as you adopt ‘data accessibility’ for your strategy vs data migration. To avoid vendor-lock in during this journey, we strongly encourage the adoption of an Apache Iceberg architecture. This allows you to benefit from its open, high-performance while maintaining rigorous data access and management controls.

3) Adopting a “multi-everything” approach to data

Adopting a multi-format, multi-cloud, and multi-modal data architecture has many advantages.Apache Iceberg is fast becoming the de facto open standard for this very reason. In choosing Iceberg, you enable a model where YOU “own your data” and more importantly “own options” for that data in the form of compute, storage, and scaling optimization scale. The best way to use this openness is by using Starburst. This allows you to run “your compute” and leverage “your storage” for analytical workloads on top of Apache Iceberg while enabling you the best market cost and performance to support your organizational growth needs.

Get started with Starburst

 

Install anywhere

Starburst includes everything you need to install and run Trino on a single machine, a cluster of machines, or even your laptop.

Download Free

Cloud-native, frictionless, and fully managed. The fastest path from big data to better decisions.

Start Free

Marketplace offerings

Try Starburst in your preferred marketplace

 

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.

s