How does data federation work?
Evan Smith
Technical Content Manager
Starburst Data
Erin Rosas
Curriculum Developer
Starburst
Evan Smith
Technical Content Manager
Starburst Data
Erin Rosas
Curriculum Developer
Starburst
Share
More deployment options
For data to be valuable, it has to be useful and that means it needs to drive business insights. But in the modern data landscape, data often resides in multiple locations. For example, an organization might use a data warehouse for one use case and a data lake for another. Sometimes these choices follow divisions in the organizational structure with different departments each creating their own data source.
This creates a siloing problem. In such a scenario, gaining insights becomes both complicated and costly, often requiring data to be moved from one system to another to create a single source of truth. But realizing this source of truth can be an unending task and many businesses either commit limitless resources to the task or never achieve success at all. Think of sisyphus rolling his rock up the hill for eternity.
Solving data silo issues
Solving this problem is really what data federation is all about. Federation unlocks the value of your data by creating a connection across multiple data sources. This powerful approach gives businesses more options and increased flexibility, opening up a world of possibilities. Using federation, your organization no longer needs to move data unnecessarily to a central source of truth. Instead, you can focus on creating insights and driving value.
How do you connect to disparate data sources?
Federation is best when it offers lots of options. This ensures that no matter where your data lives it can connect with data in other sources. Starburst’s connector ecosystem includes 50+ connectors, allowing connections to both cloud and on-prem data sources.
This breadth of connectors includes many enhanced proprietary connectors, further enhancing the options available. Overall, federation lowers costs, increases convenience, and improves versatility.
Getting started: query federation tutorial
Discover, locate, govern, and query your data from multiple data sources
Who uses data federation?
Any data professional who manages data or queries data from multiple sources from data federation. This includes:
- Data managers(i.e. Data engineers, data architects) create catalogs to connect to their organization’s data sources.
- Data consumers(i.e. Data scientists, data analysts) write queries to federate data across data sources.
How does data federation work?
The Trino SQL query engine uses connectors to communicate with many data sources simultaneously, processing and joining data from disparate sources as needed to complete a query.
Supporting this, our connector ecosystem is broad, and we’re continuously adding and improving connectors.
We connect to a variety of types of data sources, including NoSQL stores like Elasticsearch or MongoDB and relational databases like PostgreSQL. Additionally, we simplify data lake analytics by supporting all major table formats, including Iceberg and Delta Lake, persisted on Amazon S3, Azure Blob, and Google Cloud object stores.
The following image displays some of the connectors included in our connector ecosystem.
How do I federate data with Starburst platforms?
Federation is easy with Starburst Galaxy. To get started, simply create catalogs to connect to the data sources you’d like to include.
Next, join tables from different data sources in the same way you would join tables from the same data source.
The following video walks through federation in more detail using a sample dataset. You can use the same dataset with Starburst Galaxy.
Want to try federating data for yourself?
Starburst Academy has you covered. We’ve got several hands-on labs and tutorials to get you up and running quickly with federation.
Tutorial: Federate multiple data sources
Practice federating data in Starburst Galaxy and using some of the other features available
Course: Federate data with a simple query
Set up Starburst Galaxy and federate data with a simple query.