×

Get Started with Starburst Galaxy today

Sign up for Starburst Galaxy and get up to 500 credits free.

Sign up today

This post is part of the Iceberg blog series. Read the entire series:

  1. Introduction to Apache Iceberg in Trino
  2. Iceberg Partitioning and Performance Optimizations in Trino
  3. Apache Iceberg DML (update/delete/merge) & Maintenance in Trino
  4. Apache Iceberg Schema Evolution in Trino
  5. Apache Iceberg Time Travel & Rollbacks in Trino
  6. Automated maintenance for Apache Iceberg tables in Starburst Galaxy
  7. Improving performance with Iceberg sorted tables
  8. Hive vs. Iceberg: Choosing the best table format for your analytics workload

Schema evolution simply means the modification of tables as business rules and source systems are modified over time. Trino’s Iceberg connector supports different modifications to tables including the table name itself, column and partition changes.

Much like a database, you perform “alters” to Iceberg tables to modify their structure. Since Iceberg is just a table format which contains metadata about a table, modifying the table is rather trivial.

Table Changes

Rename a table

alter table customer_iceberg rename to customer_iceberg_new;

Note: this is a change to the table name in the metastore and no changes will be made in the storage. So, the location s3://<bucket>/customer_iceberg will remain the same.

Column Changes

Adding a column:

alter table customer_iceberg add column tier varchar(1);

Rename a column:

alter table customer_iceberg rename column address to fulladdress;

Partition Changes

Often times a table is initially partitioned by a column or set of columns only later it’s discovered this may not be optimal. With Iceberg, you can modify the partition columns at any time.

For example, initially this table is partitioned by month:

create table orders_iceberg
with (partitioning=ARRAY['month(orderdate)']) as
select * from tpch.sf1.orders;

After reviewing query a patterns, it’s determined that partitioning by day would perform better as a majority of queries are filter by certain days. A simple alter table statement as seen below will modify the partitioning on this table from month to day:

alter table orders_iceberg SET PROPERTIES partitioning = ARRAY['day(orderdate)'];

After new data is inserted into the table, you will see a change in the data directory where the table data is stored:

Notice the orderdate_month is now orderdate_day. Note that queries that filter by day will partition prune at the partition day level but the existing monthly partitions will still need to be searched. If you would like to have the entire table partitioned by day then you could recreate the table using a CTAS (create table as) from the existing table.

Example to create a new table partitioned by day from the existing table:

create table orders_iceberg_new with (partitioning=ARRAY['day(orderdate)']) (as select * from orders_iceberg)

Schema evolution in Trino’s Iceberg connector is very powerful and easy to use. These types of functions were not available in Hive and database veterans will be very happy to see them added to the data lake landscape.

What are some next steps you can take?

Below are three ways you can continue your journey to accelerate data access at your company

  1. 1

    Schedule a demo with us to see Starburst Galaxy in action.

  2. 2

    Automate the Icehouse: Our fully-managed open lakehouse platform

  3. 3

    Follow us on YouTube, LinkedIn, and X(Twitter).

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.

s