×

Tutorial: Using Starburst Galaxy’s materialized views with Apache Iceberg

Last Updated: June 17, 2024

Get Started with Starburst Galaxy today

Sign up for Starburst Galaxy and get up to 500 credits free.

Sign up today

Materialized views have become available in Starburst Galaxy for catalogs using Great Lakes connectivity. For folks who are NOT already using Starburst Galaxy — come sign up  it’s FREE — especially if you want to exercise the content in this blog post. To test this all out you will need to connect to one of the cloud object stores, too. Feel to look at my querying data in the cloud post if you need a jump start on that.

Please check out training opportunities from Starburst Academy where an enhanced version of this information will soon be available. And yes, enough with all of that, let’s DO SOMETHING!

Define Table and Query

Create a couple of schemas in a catalog aligned to your cloud object store.

CREATE SCHEMA mycloud.mv_storage;
CREATE SCHEMA mycloud.mvtest;
USE mycloud.mvtest;

Create an Apache Iceberg table and populate it.

CREATE TABLE dune_characters (
  id integer,
  name varchar(55),
  notes varchar(255)
)
WITH (type = 'iceberg');

INSERT INTO dune_characters 
  (id, name, notes)
VALUES
  (101, 'Leto', 'Ruler of House Atreides'),
  (102, 'Jessica', 'Concubine of the Duke');

Run a very simple query to base a materialized view on.

SELECT name, notes 
  FROM dune_characters;

Create/Load the Materialized View

Create a materialized view with the previous defining query.

CREATE MATERIALIZED VIEW mat_view
WITH (storage_schema = 'mv_storage')
AS SELECT name, notes 
     FROM dune_characters;

The WITH clause is optional, but the default behavior for the Iceberg connector writes the underlying “storage table” to the same schema. In practice, it makes sense to locate the storage table somewhere else (if only to limit consumer confusion).

If you queried mat_view now the results would be as expected, but the query would actually access the table in the defining query. The following command will (re)build the underlying storage table.

REFRESH MATERIALIZED VIEW mat_view;

You can now see the dune_characters table and the mat_view materialized view in the mvtest schema as well as the storage table located in the mv_storage schema.

Running a query on the materialized view returns the expected results.

SELECT * FROM mat_view;

To fully verify that the storage table was accessed instead of the original table, click on Query details from the screen shot above and then the Advanced tab under Query ID heading on the page that renders.

As you can see above, the storage table that starts with st_ from the mv_storage schema is being accessed instead of the dune_characters table from the defining query.

What about Starburst Enterprise?

Great news!! This all works on Starburst Enterprise, too, as called out in the Starburst Iceberg connector!

This tutorial was originally published by Lester Martin.

What are some next steps you can take?

Below are three ways you can continue your journey to accelerate data access at your company

  1. 1

    Schedule a demo with us to see Starburst Galaxy in action.

  2. 2

    Automate the Icehouse: Our fully-managed open lakehouse platform

  3. 3

    Follow us on YouTube, LinkedIn, and X(Twitter).

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.

s