Building a SQL-based data pipeline with Trino & Starburst

September 26, 2023

Evan Smith
Technical Content Manager
Starburst Data
Lester Martin
Developer Adocate
Starburst

Evan Smith
Technical Content Manager
Starburst Data
Lester Martin
Developer Adocate
Starburst

More deployment options

Request Enterprise trial license key →

We recently posted the YouTube video series below. This is a part of the FREE, on-demand Starburst Academy course Exploring Data Pipelines.

If you’re a data engineer tasked with building and managing data pipelines, Starburst Galaxy enables you to build a data pipeline workflow using modern data lakes and SQL. This approach offers both simplicity and power. What might have required a complex, user defined function (UDF) in Python using other systems can be accomplished with the accessibility and universality of SQL alongside the ease and cost effectiveness of the data lake.

Modern data lake architecture

In this video tutorial series, Starburst Academy’s Lester Martin walks you through the steps needed to set up a modern data lake. To do this we will construct a three-part modern data lake (aka data lakehouse) architecture comprising the Land, Structure, and Consume layers using Starburst Galaxy and SQL. This architecture is rapidly evolving as the new standard for modern data lakes and lakehouses based around open table formats like Iceberg, Delta Lake, and Hudi.

Get your own Starburst Galaxy account

For this tutorial we will be using the BlueBikes dataset, which is freely available and public. In fact, using the Starburst Galaxy free trial you can follow along with each of the steps in this tutorial and create your own Land, Structure, and Consume layer in your very own modern data lake.

Let’s get going!

1. Assessing the requirements

Let’s get started with the BlueBikes dataset. This first video will show you how to download the dataset and access it using your own Starburst Galaxy cluster.

2. Creating the land layer

Now that you’re up and running in Starburst Galaxy, it’s time to begin by creating the first of the three-part modern data lake architecture, the Land layer. This layer will receive raw data from the source. This will serve as the basis for future transformations as the data moves through the next two layers.

3. Creating the structure layer

With the Land layer complete, it’s time to set up the second layer in the three part modern data lake structure, the Structure layer. This layer requires transformations from the Land layer and all of that work can be accomplished using Starburst Galaxy and SQL. When complete, the Structure layer will become the new source of truth for the dataset.

4. Creating the consume layer

Now that you’ve constructed the Structure layer, you only have one step left, the Consume layer. This final layer makes the data available to queries and BI tools and constructing it completes the last of the three-part modern data lake structure.

5. Automation with Starburst and dbt

We’re not done yet! You’ve constructed all of the three layers needed for a modern data lake using Starburst Galaxy, but there’s another trick up our sleeves, automation. Starburst Galaxy lets you execute powerful data engineering workloads using SQL, and its integration with dbt Cloud lets you wrap up all of that work and automate it according to a work schedule. In real-world workflows this is a powerful strategy for achieving greater efficiency in data engineering.

Excited? Learn how to automate with Starburst Galaxy and dbt cloud.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.

Building a SQL-based data pipeline with Trino & Starburst

More deployment options

Modern data lake architecture

Get your own Starburst Galaxy account

1. Assessing the requirements

2. Creating the land layer

3. Creating the structure layer

4. Creating the consume layer

5. Automation with Starburst and dbt

A Better Solution For Managing and Maintaining Data Pipelines, Now In Public Preview

Building lakehouse with dbt and Trino

Build a Data Lakehouse Reporting Structure with dbt and Starburst Galaxy

Build and run scalable transformation pipelines using dbt Cloud and Starburst

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

Building a SQL-based data pipeline with Trino & Starburst

More deployment options

Modern data lake architecture

Get your own Starburst Galaxy account

1. Assessing the requirements

2. Creating the land layer

3. Creating the structure layer

4. Creating the consume layer

5. Automation with Starburst and dbt

A Better Solution For Managing and Maintaining Data Pipelines, Now In Public Preview

Building lakehouse with dbt and Trino

Build a Data Lakehouse Reporting Structure with dbt and Starburst Galaxy

Build and run scalable transformation pipelines using dbt Cloud and Starburst

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies