I did work some more with partitions and got even more confused. I created an Iceberg table with partitions. Then I altered the partitioning and used compaction. However the query plan of a query, that should be using partition pruning, shows a normal filterPredicate.
The SQL-Statements I executed are the following:
CREATE TABLE train_maintenance (
maintenance_id INT,
train_id INT,
maintenance_type VARCHAR(50),
maintenance_date DATE,
cost DECIMAL(10, 2),
location VARCHAR(50) )
WITH (
partitioning = ARRAY[ 'location', 'maintenance_type']
);
INSERT INTO train_maintenance (maintenance_id, train_id, maintenance_type, maintenance_date, cost, location)
VALUES
(1, 101, 'Engine Check', DATE '2023-01-15', 500.00, 'Berlin Maintenance Facility'),
(2, 102, 'Brake Inspection', DATE '2023-01-20', 300.00, 'Munich Depot'),
(3, 101, 'Wheel Alignment', DATE '2023-02-10', 200.00, 'Berlin Maintenance Facility'),
(4, 103, 'Engine Check', DATE '2023-02-25', 550.00, 'Hamburg Repair Yard'),
(5, 104, 'Brake Inspection', DATE '2023-03-05', 320.00, 'Frankfurt Maintenance Hub'),
(6, 105, 'Engine Check', DATE '2023-03-15', 510.00, 'Stuttgart Service Center'),
(7, 101, 'Wheel Alignment', DATE '2023-04-10', 210.00, 'Berlin Maintenance Facility'),
(8, 102, 'Brake Inspection', DATE '2023-04-20', 310.00, 'Munich Depot'),
(9, 103, 'Engine Check', DATE '2023-05-25', 560.00, 'Hamburg Repair Yard'),
(10, 104, 'Wheel Alignment', DATE '2023-06-05', 220.00, 'Frankfurt Maintenance Hub');
ALTER TABLE train_maintenance
SET PROPERTIES partitioning = ARRAY['month(maintenance_date)'];
INSERT INTO train_maintenance (maintenance_id, train_id, maintenance_type, maintenance_date, cost, location)
VALUES
(11, 106, 'Engine Check', DATE '2023-06-15', 520.00, 'Berlin Maintenance Facility'),
(12, 107, 'Brake Inspection', DATE '2023-07-01', 330.00, 'Munich Depot'),
(13, 108, 'Wheel Alignment', DATE '2023-07-10', 230.00, 'Berlin Maintenance Facility'),
(14, 109, 'Engine Check', DATE '2023-07-25', 540.00, 'Hamburg Repair Yard'),
(15, 110, 'Brake Inspection', DATE '2023-08-05', 340.00, 'Frankfurt Maintenance Hub'),
(16, 111, 'Engine Check', DATE '2023-08-15', 530.00, 'Stuttgart Service Center'),
(17, 112, 'Wheel Alignment', DATE '2023-09-10', 240.00, 'Berlin Maintenance Facility'),
(18, 113, 'Brake Inspection', DATE '2023-09-20', 350.00, 'Munich Depot'),
(19, 114, 'Engine Check', DATE '2023-10-05', 550.00, 'Hamburg Repair Yard'),
(20, 115, 'Wheel Alignment', DATE '2023-10-15', 250.00, 'Frankfurt Maintenance Hub');
ALTER TABLE train_maintenance EXECUTE optimize;
SELECT * FROM "train_maintenance$partitions";
When i do
select * from train_maintenance where maintenance_date >= DATE '2023-10-01';
the query plan looks like this:
Why is that?
Moreover I was wondering if there is a way to change the partitioning from
ARRAY[ 'month(maintenance_date)', 'maintenance_type']
to ARRAY[ 'month(maintenance_date)', 'location']
without using a CTA, as this always throws the following error:
Cannot add duplicate partition field null=month(ref(name=“maintenance_date”)), conflicts with 1000: maintenance_date_month: month(4)
Last but not least I would love to know if there are guidlines of when to use sorted_by with partitioning.