Tutorial: Using Starburst Galaxy’s materialized views with Apache Iceberg

Share

Materialized views have become available in Starburst Galaxy for catalogs using Great Lakes connectivity. For folks who are NOT already using Starburst Galaxy — come sign up  it’s FREE — especially if you want to exercise the content in this blog post. To test this all out you will need to connect to one of the cloud object stores, too. Feel to look at my querying data in the cloud post if you need a jump start on that.

Please check out training opportunities from Starburst Academy where an enhanced version of this information will soon be available. And yes, enough with all of that, let’s DO SOMETHING!

Define Table and Query

Create a couple of schemas in a catalog aligned to your cloud object store.

CREATE SCHEMA mycloud.mv_storage;
CREATE SCHEMA mycloud.mvtest;
USE mycloud.mvtest;

Create an Apache Iceberg table and populate it.

CREATE TABLE dune_characters (
  id integer,
  name varchar(55),
  notes varchar(255)
)
WITH (type = 'iceberg');

INSERT INTO dune_characters 
  (id, name, notes)
VALUES
  (101, 'Leto', 'Ruler of House Atreides'),
  (102, 'Jessica', 'Concubine of the Duke');

Run a very simple query to base a materialized view on.

SELECT name, notes 
  FROM dune_characters;

Create/Load the Materialized View

Create a materialized view with the previous defining query.

CREATE MATERIALIZED VIEW mat_view
WITH (storage_schema = 'mv_storage')
AS SELECT name, notes 
     FROM dune_characters;

The WITH clause is optional, but the default behavior for the Iceberg connector writes the underlying “storage table” to the same schema. In practice, it makes sense to locate the storage table somewhere else (if only to limit consumer confusion).

If you queried mat_view now the results would be as expected, but the query would actually access the table in the defining query. The following command will (re)build the underlying storage table.

REFRESH MATERIALIZED VIEW mat_view;

You can now see the dune_characters table and the mat_view materialized view in the mvtest schema as well as the storage table located in the mv_storage schema.

Running a query on the materialized view returns the expected results.

SELECT * FROM mat_view;

To fully verify that the storage table was accessed instead of the original table, click on Query details from the screen shot above and then the Advanced tab under Query ID heading on the page that renders.

As you can see above, the storage table that starts with st_ from the mv_storage schema is being accessed instead of the dune_characters table from the defining query.

What about Starburst Enterprise?

Great news!! This all works on Starburst Enterprise, too, as called out in the Starburst Iceberg connector!

This tutorial was originally published by Lester Martin.