I am struggeling with the optimize function of the hive connector. It keeps saying that executing OPTIMIZE is not supported by Hive tables.
I created a table with the following statement:
create table tmp_cat_hive.hive_schema.t1
with ( partitioned_by= ARRAY['c_birth_month'] )
AS
SELECT c_first_name, c_last_name, c_birth_month
FROM sample_data_tpc.sf1.customer;
Then I did :
SET SESSION tmp_cat_hive.non_transactional_optimize_enabled=true;
ALTER TABLE tmp_cat_hive.hive_schema.t1 EXECUTE optimize;
However i still get:
Executing OPTIMIZE on Hive tables is not supported
Any idea about what I could be doing wrong?
I do know that this is the case in Starburst Galaxy for Hive tables. I also know that many folks do not recommend using optimize
with the Hive connector on SEP, but I’d need to spin up a cluster to double-check if it still is operational (I think it is).
On Galaxy, this will work for Iceberg tables.
I use Starburst Galaxy, so you don’t need to start a cluster to check this. The Starburst Galaxy documentation doesn’t have much information about the connectors, so I looked in the Enterprise documentation. Sometimes it’s hard to tell what will work in Galaxy and what won’t.
Anyways. Thanks for your help!
1 Like
I do agree with that strategy (leveraging the formal connector docs) AND with things don’t always work the same. I do know the Docs team are always working hard to make things better. It would be nice to know of “things that the Trino/SEP connectors do that just don’t work with Galaxy”.
As a shout out example to the Docs team, they have doc’d Starburst | Iceberg table format showing how to migrate from Hive to Iceberg which is different than how you do it in the SEP Iceberg connector.
In fairness, I guess that’s actually what they are already trying to do. Show what can be done as apposed to what can’t be done. Of course, everything can always be improved and I assure you the Docs team is about continued improvement. Good folks there!!
1 Like