Redefine tables within object storage connectors like Iceberg and Delta Lake without any downtime

  • Michiel De Smet

    Michiel De Smet

    Software Engineer

    Starburst

  • Damian Owsianny

    Damian Owsianny

    Software Engineer

    Starburst

Share

Since Trino 431 release, support for CREATE [OR REPLACE] TABLE statements was added. The relevant issue is available here.

What is a CREATE [OR REPLACE] TABLE statement?

This new capability allows you to redefine tables within object storage connectors like Iceberg and Delta Lake. It enables atomic table replacements, ensuring that concurrent queries always access a valid version of the table. This guarantees seamless transitions and uninterrupted data availability.

Why is CREATE [OR REPLACE] TABLE important?

  • Continuous Availability: BI tools and ETL processes won’t be impacted by updating tables as all tables continue to exist without any downtime
  • Historical Data Retention: Table history is preserved for auditing purposes and easy rollback. to previous versions if needed.
  • Enhanced Reliability: Transactions are managed atomically, providing consistency and reliability during table updates.

Let’s take a closer look below

Table History

The CREATE [OR REPLACE] TABLE statement allows specific object storage connectors (currently Iceberg and Delta Lake) to set a new version of the table with totally different data and structure while retaining the history of the table. 

Atomic table replacement

This change leverages these specific object storage connectors’ support for transactions (Iceberg and Delta), permitting atomic table replacements. This means that any queries concurrent with the CREATE OR REPLACE TABLE operation use either the old or new table version.

Let’s go through an example (we are using Iceberg here, because we believe Iceberg is the table format that is the most easy to use and evolve). When creating a new table or replacing an existing one, the operation is performed atomically. This means your data remains consistent, and users can query the old or new table version seamlessly.

CREATE TABLE Example

Creating a new table is simple:

CREATE TABLE my_table AS SELECT 'test' a, DOUBLE '38.5' b;

Now let’s overwrite this table with new data and a totally different table definition:

CREATE OR REPLACE TABLE my_table AS SELECT BIGINT '-42' a, DOUBLE '38.5' c;

The last statement will atomically replace the table definition with the new table definition without impacting reads on the first version.

We can now check the versions (snapshots in Iceberg lingo) of this table by executing the following query:

SELECT * FROM "my_table$snapshots" ORDER BY committed_at DESC

Let’s use the snapshot id from the first version to retrieve the data as of that time (SELECT 'test' a, DOUBLE '38.5' b).

SELECT * FROM my_table FOR VERSION AS OF <snapshot_id>

If you like this old version better than the current one, you can easily roll back to the earlier version by executing the following statement, which will atomically replace the current state with the old state:

CALL <catalog_name>.system.rollback_to_snapshot('<schema_name>', 'my_table', <snapshot_id>)

Voila, now the data in my_table is rolled back to the desired state.

How is CREATE [OR REPLACE] TABLE statement  relevant to  dbt?

One tool which especially benefits from implementing CREATE [OR REPLACE] TABLE is the Trino dbt adapter – dbt-trino.

During the dbt run command, SQL queries representing dbt models are executed against Trino. If a table representing a certain dbt model already exists, particular actions are taken depending on materialization type and model configuration.

To take advantage of CREATE [OR REPLACE] TABLE in Trino, we added a new mode to table materialization in dbt-trino 1.7.1 – replace.

Until now, there were two modes available in table materialization for handling existing tables:

  • `rename` — Creates an intermediate table, renames the target table to the backup one, and renames the intermediate table to the target one.

In case of error during CREATE TABLE operation we still have the original table. Any reads on the older table will fail when the table is dropped. There is a short period of time when the table is not available. Table history is also lost, as we are creating a new table.

  • `drop` — Drops and re-creates a table.

Table is unavailable until CREATE TABLE finishes. This is unsafe operation, as after dropping a table we could encounter an error during table re-creation, and end up with a missing data.

`replace` is superior to both of the old modes, as it safely replaces a table ‘in place’ without additional overhead. The table is always ready to read, without downtime, and we retain its history.

Documentation about `on_table_exists` modes is available here 

To better visualize the time when a table is available vs. unavailable, let’s look at the chart below:

Fig. 1 Availability time by on_table_exist mode, for model defined as select * from tpch.sf10.orders. For `replace`, the table is available during the entire operation. For `drop`, the table is unavailable during the entire operation. For `rename`, there is a very short time period during table rename that the table is unavailable. When data is deleted from the dropped table, it will also impact any ongoing reads on that table.

Code example of table materialization and `rename` mode usage

Let’s see the difference between `rename` (default) and `replace` (recommended if connector supports it) mode.

First, let’s see how it works with the default mode, ‘rename’

 {{
  config(
    materialized = 'table',
    on_table_exists = 'rename'
  )
}}

SELECT * FROM {{ ref('some_sample_seed') }}

Next, let’s see which queries are executed by running `dbt run` after the table has been created:

-- creates new table with __dbt_tmp suffix
create table "iceberg"."default"."cortas_example__dbt_tmp"
...

-- renames old table - adds __dbt_backup suffix
alter table "iceberg"."default"."cortas_example" rename to "iceberg"."default"."cortas_example__dbt_backup"

-- renames new table - removes suffix
alter table "iceberg"."default"."cortas_example__dbt_tmp" rename to "iceberg"."default"."cortas_example"

-- drops old table
drop table if exists "iceberg"."default"."cortas_example__dbt_backup"

Now change `on_table_exists` config to ‘replace’:

{{
  config(
    materialized = 'table',
    on_table_exists = 'replace'
  )
}}

SELECT * FROM {{ ref('some_sample_seed') }}

and execute `dbt run` once again. As we can see, only one CREATE [OR REPLACE] TABLE query was executed instead of 4 queries:

-- one CREATE [OR REPLACE] TABLE query
create or replace table "iceberg"."default"."cortas_example"

If you would like to learn more about dbt-trino, there is a blog titled ‘Building a Lakehouse with dbt and Trino‘  authored by Michiel De Smet and Przemek Denkiewicz. It is a great resource for those looking to familiarize themselves with dbt-trino, providing step-by-step examples that you can try on your own. We highly recommend it if you are new to dbt-trino or haven’t yet explored this blog!

Optimize your data lake with Iceberg and Trino

Explore the future of data lakes through our comprehensive technical how-to whitepaper. 

Learn more