How does data federation work?

Tutorial: Easily access all of your data, no matter where it lives

August 3, 2023

Evan Smith
Technical Content Manager
Starburst Data
Erin Rosas
Curriculum Developer
Starburst

Evan Smith
Technical Content Manager
Starburst Data
Erin Rosas
Curriculum Developer
Starburst

More deployment options

Request Enterprise trial license key →

For data to be valuable, it has to be useful and that means it needs to drive business insights. But in the modern data landscape, data often resides in multiple locations. For example, an organization might use a data warehouse for one use case and a data lake for another. Sometimes these choices follow divisions in the organizational structure with different departments each creating their own data source.

This creates a siloing problem. In such a scenario, gaining insights becomes both complicated and costly, often requiring data to be moved from one system to another to create a single source of truth. But realizing this source of truth can be an unending task and many businesses either commit limitless resources to the task or never achieve success at all. Think of sisyphus rolling his rock up the hill for eternity.

Solving data silo issues

Solving this problem is really what data federation is all about. Federation unlocks the value of your data by creating a connection across multiple data sources. This powerful approach gives businesses more options and increased flexibility, opening up a world of possibilities. Using federation, your organization no longer needs to move data unnecessarily to a central source of truth. Instead, you can focus on creating insights and driving value.

How do you connect to disparate data sources?

Federation is best when it offers lots of options. This ensures that no matter where your data lives it can connect with data in other sources. Starburst’s connector ecosystem includes 50+ connectors, allowing connections to both cloud and on-prem data sources.

This breadth of connectors includes many enhanced proprietary connectors, further enhancing the options available. Overall, federation lowers costs, increases convenience, and improves versatility.

Getting started: query federation tutorial

Discover, locate, govern, and query your data from multiple data sources

Access tutorial

Who uses data federation?

Any data professional who manages data or queries data from multiple sources from data federation. This includes:

Data managers(i.e. Data engineers, data architects) create catalogs to connect to their organization’s data sources.
Data consumers(i.e. Data scientists, data analysts) write queries to federate data across data sources.

How does data federation work?

The Trino SQL query engine uses connectors to communicate with many data sources simultaneously, processing and joining data from disparate sources as needed to complete a query.

Supporting this, our connector ecosystem is broad, and we’re continuously adding and improving connectors.

We connect to a variety of types of data sources, including NoSQL stores like Elasticsearch or MongoDB and relational databases like PostgreSQL. Additionally, we simplify data lake analytics by supporting all major table formats, including Iceberg and Delta Lake, persisted on Amazon S3, Azure Blob, and Google Cloud object stores.

The following image displays some of the connectors included in our connector ecosystem.

How do I federate data with Starburst platforms?

Federation is easy with Starburst Galaxy. To get started, simply create catalogs to connect to the data sources you’d like to include.

Next, join tables from different data sources in the same way you would join tables from the same data source.

The following video walks through federation in more detail using a sample dataset. You can use the same dataset with Starburst Galaxy.

Want to try federating data for yourself?

Starburst Academy has you covered. We’ve got several hands-on labs and tutorials to get you up and running quickly with federation.

Tutorial: Federate multiple data sources

Practice federating data in Starburst Galaxy and using some of the other features available

Practice federating data

Course: Federate data with a simple query

Set up Starburst Galaxy and federate data with a simple query.

Practice on Galaxy

How does data federation work?

More deployment options

Solving data silo issues

How do you connect to disparate data sources?

Getting started: query federation tutorial

Who uses data federation?

How does data federation work?

How do I federate data with Starburst platforms?

Want to try federating data for yourself?

Tutorial: Federate multiple data sources

Course: Federate data with a simple query

Data Federation and Data Virtualization Never Worked in the Past But Now it’s Different

Building the Data Lake Analytics Stack

Accelerate AI with a data lake analytics platform

Query Federation Made Simple at Comcast

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

How does data federation work?

More deployment options

Solving data silo issues

How do you connect to disparate data sources?

Getting started: query federation tutorial

Who uses data federation?

How does data federation work?

How do I federate data with Starburst platforms?

Want to try federating data for yourself?

Tutorial: Federate multiple data sources

Course: Federate data with a simple query

Data Federation and Data Virtualization Never Worked in the Past But Now it’s Different

Building the Data Lake Analytics Stack

Accelerate AI with a data lake analytics platform

Query Federation Made Simple at Comcast