17 mins remaining

Configure a Starburst Galaxy data lake catalog and schema

17 mins remaining

1. Tutorial overview

Last Updated: 2024-03-28

Background

Many of the Starburst tutorials require access to a writable data source to successfully complete all of the exercises. This tutorial will walk you through the steps to set up an environment that can be used for those tutorials.

To build the environment, you will create the following:

An Amazon S3 catalog that connects your Starburst Galaxy account to an Amazon S3 bucket.
A schema in the S3 catalog.

Scope of tutorial

In this tutorial, you will learn how to configure a catalog in Starburst Galaxy that connects to Amazon S3 object storage.

An S3 bucket with read/write permissions has been set up for you to use. It contains one table made up of csv files with sample payment transaction data.

Learning objectives

Once you've completed this tutorial, you will be able to:

Configure an Amazon S3 catalog using Starburst Galaxy.
Create a new schema in Starburst Galaxy.

Prerequisites

You need a Starburst Galaxy account to complete this tutorial. Please see Starburst Galaxy: Getting started for instructions on setting up a free account.

About Starburst tutorials

Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.

As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.

2. Sign into Starburst Galaxy and set Admin role

Background

You're going to begin by signing in to Starburst Galaxy and setting your role to begin the process of connecting the Amazon S3 data source.

This is a quick step, but an important one.

Step 1: Sign into Starburst Galaxy

Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.

Step 2: Set your role

Your current role is listed in the top right-hand corner of the screen.

Check your role, to ensure that it is set to accountadmin.
If it is set to anything else, use the drop-down menu to select the correct role.

3. Create new Amazon S3 catalog

Background

Adding a new Amazon S3 catalog follows the same process as adding other data sources in Starburst Galaxy. This is one of the main ways that Starburst Galaxy is used to connect to data lakes.

The steps below will show you how to start the process of configuring a new catalog.

Step 1: Create a new catalog

Create a new catalog for the Amazon S3 data source.

In the left-hand navigation bar, click Data>>Catalogs.
Click the Create catalog button.

Step 2: Select Amazon S3 data source

Starburst Galaxy allows the creation of catalogs for a number of different data sources. In this case, you are going to create a new catalog in the Amazon S3 category.

Click the Amazon S3 tile.

Step 3: Input name and description

The catalog needs both a name and description. This ensures that you can find it later.

In the Catalog name field, enter tmp_cat. This name stands for "temporary catalog," and will serve as a reminder that this catalog is not for long-term use.
In the Description field, input a meaningful description.

Example: Delete this catalog after use (data removed daily, but metastore still thinks it exists)

Scroll down to continue the configuration process.

4. Amazon S3 authentication

Background

When you connect Starburst Galaxy to a new data source, it is necessary to undergo an authentication process. This helps ensure that you are connecting the right data source and that you have the appropriate permissions.

Step 1: Choosing an authentication method

Starburst Galaxy allows you to configure several different authentication methods when creating a new catalog. This lets you connect to data sources of different types. In this particular example, we will use the AWS access key option.

Select AWS access key.
In the AWS access key for S3 field, enter AKIAYUW62MUV5WTUWTPY
In the AWS secret key for S3 field, enter zhkzRydOWqLrtBcajgbvc0qGZ7w8W6rtPBK4y7zl

5. Connect to metastore

Background

Starburst Galaxy uses a metastore to keep track of the location of your data when it is added to the data lake, in this case to Amazon S3.

You have three options when choosing a metastore. For the purposes of this tutorial, we will be using the Starburst Galaxy metastore.

Step 1: Select the metastore

Starburst Galaxy includes its own metastore, which can be used to easily store metadata. Using this option is often the simplest metadata management solution.

The choice of metastore is completely decoupled from the choice of storage option, allowing you to mix and match.

Select Starburst Galaxy.
In the Default S3 bucket name field, enter starburst-tutorials.
In the Default directory name field, enter projects.
Select Allow creating external tables.

This will allow you to create external tables outside of the default S3 bucket.

Select Allow writing to external tables.

This will allow you to write data into external tables outside of the default S3 bucket.

6. Select table format

Background

Table formats control the way that data is stored. These include popular modern, open table formats like Iceberg or Delta Lake, or older table formats like Hive.

Step 1: Select the default table format

If you are planning to complete the tutorial Migrate Hive tables to Apache Iceberg with Starburst Galaxy, you must select Hive as the default table format. Otherwise, it is your choice.

Select the default table format.

7. Test connection and connect catalog

Background

Every new catalog connection includes a test before you connect it. This helps to ensure that you have input the correct credentials and allows you to quickly fix any problems before actually connecting.

Step 1: Test and Connect

You're almost there! Time to test the connection and then complete the process of creating your new Amazon S3 catalog.

Click the Test connection button.
Confirm that you see the Hooray! You can now add this catalog to a cluster message.
Click the Connect catalog button.

8. Configure access controls

Background

Starburst Galaxy allows you to configure your catalog in a number of ways regarding access controls. The most important of these involves granting write access or restricting the catalog to read-only access.

Step 1: Save access controls

We need write access for our catalog, so we will leave the access controls as they are.

Click the Save access controls button.

Step 2: Add catalog to cluster

You are going to add the new catalog to the cluster you created in the Starburst Galaxy: Getting started tutorial.

Expand the Select clusters dropdown menu, and select the aws-us-east-1-free cluster.
Click the Add to cluster button.
Click the Do this later button on the pop-up confirmation window. This will bypass schema discovery, which is not necessary at this time.

9. Create new schema

Background

With Starburst Galaxy, it's easy to create a new schema directly from the query editor.

Step 1: Navigate to Query editor

Hover over the left-hand navigation menu to expand it.
Select Query>>Query editor.

Step 2: Create schema

When you create a schema, it will create a folder with the schema name inside the Amazon S3 bucket. It is important that the name of that folder is unique, which is why this step has a required naming convention for the schema.

Copy and paste the following SQL in the query editor.
Replace first with your first name.
Replace last with your last name.
Replace postalcode with your postal code. If you prefer not to use your postal code, any five numbers will work.
Click the Run (limit 1000) button to run the SQL.

CREATE SCHEMA tmp_cat.tmp_first_last_postalcode;

10. Tutorial wrap-up

Tutorial complete

Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.

You're all set! Now you can use your new catalog and schema in our other tutorials.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Back

Configure a Starburst Galaxy data lake catalog and schema

1. Tutorial overview

Background

Scope of tutorial

Learning objectives

Prerequisites

About Starburst tutorials

2. Sign into Starburst Galaxy and set Admin role

Background

Step 1: Sign into Starburst Galaxy

Step 2: Set your role

3. Create new Amazon S3 catalog

Background

Step 1: Create a new catalog

Step 2: Select Amazon S3 data source

Step 3: Input name and description

4. Amazon S3 authentication

Background

Step 1: Choosing an authentication method

5. Connect to metastore

Background

Step 1: Select the metastore

6. Select table format

Background

Step 1: Select the default table format

7. Test connection and connect catalog

Background

Step 1: Test and Connect

8. Configure access controls

Background

Step 1: Save access controls

Step 2: Add catalog to cluster

9. Create new schema

Background

Step 1: Navigate to Query editor

Step 2: Create schema

10. Tutorial wrap-up

Tutorial complete

Continuous learning

Next steps

Tutorials available

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies