Alter table execute optimize

mcbenzing · July 8, 2024, 9:20am

I create partitioned Iceberg tables and try to use compaction via the optimize command. However, I can’t really understand how the optimization works yet. Does it only change the way the data is stored if I have changed the partition key before? In the Iceberg Connector documentation it says:

All files with a size below the optional parameter file_size_threshold (default value for the threshold is 100MB) will be merged

But if I never altered the partitioning, files from different partitions will not be merged no matter how small they are, right?

I also read this blog post:
well-designed partitions help with iceberg compaction (call them ice cubes) - Lester Martin (l11n)

And I was wondering if this means that one should never use execute optimize on partitions that are not yet static. Or should one just avoid compacting data that might be changed at the time of compaction?

Thanks in advance!

mcbenzing · July 8, 2024, 11:21am

I did work some more with partitions and got even more confused. I created an Iceberg table with partitions. Then I altered the partitioning and used compaction. However the query plan of a query, that should be using partition pruning, shows a normal filterPredicate.

The SQL-Statements I executed are the following:

CREATE TABLE train_maintenance ( 
maintenance_id INT, 
train_id INT, 
maintenance_type VARCHAR(50), 
maintenance_date DATE, 
cost DECIMAL(10, 2),
location VARCHAR(50) )
WITH (
    partitioning = ARRAY[ 'location', 'maintenance_type']
);

INSERT INTO train_maintenance (maintenance_id, train_id, maintenance_type, maintenance_date, cost, location) 
VALUES 
(1, 101, 'Engine Check', DATE '2023-01-15', 500.00, 'Berlin Maintenance Facility'), 
(2, 102, 'Brake Inspection', DATE '2023-01-20', 300.00, 'Munich Depot'), 
(3, 101, 'Wheel Alignment', DATE '2023-02-10', 200.00, 'Berlin Maintenance Facility'), 
(4, 103, 'Engine Check', DATE '2023-02-25', 550.00, 'Hamburg Repair Yard'), 
(5, 104, 'Brake Inspection', DATE '2023-03-05', 320.00, 'Frankfurt Maintenance Hub'), 
(6, 105, 'Engine Check', DATE '2023-03-15', 510.00, 'Stuttgart Service Center'), 
(7, 101, 'Wheel Alignment', DATE '2023-04-10', 210.00, 'Berlin Maintenance Facility'), 
(8, 102, 'Brake Inspection', DATE '2023-04-20', 310.00, 'Munich Depot'), 
(9, 103, 'Engine Check', DATE '2023-05-25', 560.00, 'Hamburg Repair Yard'), 
(10, 104, 'Wheel Alignment', DATE '2023-06-05', 220.00, 'Frankfurt Maintenance Hub');

ALTER TABLE train_maintenance
SET PROPERTIES partitioning = ARRAY['month(maintenance_date)'];

INSERT INTO train_maintenance (maintenance_id, train_id, maintenance_type, maintenance_date, cost, location) 
VALUES 
(11, 106, 'Engine Check', DATE '2023-06-15', 520.00, 'Berlin Maintenance Facility'), 
(12, 107, 'Brake Inspection', DATE '2023-07-01', 330.00, 'Munich Depot'), 
(13, 108, 'Wheel Alignment', DATE '2023-07-10', 230.00, 'Berlin Maintenance Facility'), 
(14, 109, 'Engine Check', DATE '2023-07-25', 540.00, 'Hamburg Repair Yard'), 
(15, 110, 'Brake Inspection', DATE '2023-08-05', 340.00, 'Frankfurt Maintenance Hub'), 
(16, 111, 'Engine Check', DATE '2023-08-15', 530.00, 'Stuttgart Service Center'), 
(17, 112, 'Wheel Alignment', DATE '2023-09-10', 240.00, 'Berlin Maintenance Facility'), 
(18, 113, 'Brake Inspection', DATE '2023-09-20', 350.00, 'Munich Depot'), 
(19, 114, 'Engine Check', DATE '2023-10-05', 550.00, 'Hamburg Repair Yard'), 
(20, 115, 'Wheel Alignment', DATE '2023-10-15', 250.00, 'Frankfurt Maintenance Hub');

ALTER TABLE train_maintenance EXECUTE optimize;

SELECT * FROM "train_maintenance$partitions";

When i do

select * from train_maintenance where maintenance_date >= DATE '2023-10-01';

the query plan looks like this:

Why is that?

Moreover I was wondering if there is a way to change the partitioning from
ARRAY[ 'month(maintenance_date)', 'maintenance_type'] to ARRAY[ 'month(maintenance_date)', 'location'] without using a CTA, as this always throws the following error:

Cannot add duplicate partition field null=month(ref(name=“maintenance_date”)), conflicts with 1000: maintenance_date_month: month(4)

Last but not least I would love to know if there are guidlines of when to use sorted_by with partitioning.

lester · July 10, 2024, 4:45pm

You absolutely can run OPTIMIZE on a partition that isn’t yet static. I was mostly showing that IF your partition strategy does allow for the data in a partition to eventually go fully (or mostly) static that OPTIMIZE would be a great thing to run at that time.

In general, the OPTIMIZE is simply going to look at files that fall below the threshold size. Those are candidates to be rewritten/compacted along with the contents of other files (below the threshold). The OPTMIZE can be run against the entire table or just a single partition. The docs at Iceberg connector — Trino 451 Documentation show you can even add a WHERE clause to focus very specifically on only certain data (example shows using a date column), but this will simply find files that are backing the predicate results and if they are below the threshold they will then be candidates for compaction/rewritten to fewer, larger, files.

If it wasn’t clear, you are not 100% controlling EXACTLY what will happen with this compaction routine. You are just letting Trino have a chance to rewrite many smaller files into fewer larger files.

lester · July 10, 2024, 4:57pm

There are multiple things happening at once in your scenario above. As you might already know, changing the partitioning strategy with an ALTER statement does NOT rewrite anything. Then, firing the OPTIMIZE may rewrite some of your files, but even if so, no guarantee it will tackle everything. Querying the $files metadata table before and after every snapshot can help you understand things better.

Additionally, this is a very small set of data and would be probably better to experiment by creating larger tables by pulling in data from the TPC* catalogs as a possibility.

sort_by actually doesn’t fully work like you might think it does. basically, it isn’t doing a total order sort across all the data being (re)written at once. As the docs at Iceberg connector — Trino 451 Documentation says “data is sorted during writes within each file based on the specified array of one or more columns”. For more details (benefits and negatives) of this I encourage you to join the Trino slack where there is several threads on this topic.

lester · July 10, 2024, 5:41pm

As for the problems you had changing the partitioning’s 2nd column to a different value, my quick testing did not produce this problem. I ran the following w/o any issues.

create table lineitem_ice
with (type='iceberg', partitioning=array['month(commitdate)', 'shipinstruct'])
as select * from tpch.sf1.lineitem;

alter table lineitem_ice
set properties partitioning=array['month(commitdate)', 'shipmode'];

insert into lineitem_ice
select * from tpch.tiny.lineitem
where commitdate >= date('1992-02-01')
and commitdate < date('1992-03-01');

ALTER TABLE lineitem_ice EXECUTE optimize
where commitdate >= date('1995-01-01');