Anyone is doing big ETL in Trino? and rewriting to the same partition? if so, how do you do it? we have create some airflow-operator that writes (insert into) to temp tables and we copy parquet files to the production table s3 path & delete old files. it works, but not the best solution, anyone else is doing something better? maybe inserting to Iceberg or anything? would love to hear .
To find these I literally just searched for messages from @Alec Heifetz that mention “insert”. He’s been working on solving this problem for some time. What you’re doing seems to be the current state of the art. I’m sure Alec may have some tips on optimizing this though. There is a lot of promise with the Iceberg format, especially now that v2 has now been GA. Trino has been primarily focused on OLAP as OLTP use cases are currently more of afterthoughts to make sure we are the best OLAP option.That said, I think there needs to be a lot more adoption of OLTP use cases in Trino on Iceberg for us to gain wider adoption. I am amplifying this to Starburst but we’ve mostly been focused on OLAP so far. There have been substantial contributions from @Jack Ye in this area though. Hope these Update/merge features land soon to make Iceberg a more likely potential for you.
Most of my knowledge is in those threads but a recent learning is that Iceberg insertion to Trino is still pretty immature. There’s not as much flexibility in file size partitioning as in Hive (for example scale writers does not work and I’m not sure if the max-file-size param does either) and there are still some major issues around concurrency (right now only one job at a time can insert to iceberg tables which is a big bummer compared to hive where you do sometimes need multiple inserts at a time to maximize throughput). I think most iceberg Trino users do their writes through spark and use Trino for reads, which explains why the ecosystem has developed the way it has.
Oh and also be careful when doing writes of more than 50 partitions at once because preferred-write-partitioning might kick in which is something you may or may not want (it basically reallocates the write algorithm so that there is only one writing process per partition, which leads to probably better file chunking but potentially much lower throughput as well)
No idea if that applies to iceberg or not (to clarify my last point was about Hive writes)
Which leads to probably better file chunking but potentially much lower throughput as well)
Which might be desirable or not, depending on the data sizes.
FWIW I filed Preferred write partitioning should consider number of nodes and expected data size · Issue #10756 · trinodb/trino · GitHub with potential improvements. cc @sopel39@Alec Heifetz when sharing an experience like the above, would you like to set it in context, and mention table data sizes you’re working with?