Create table with contents in HDFS

ghuang · April 8, 2024, 8:29pm

Hello

Sorry for these naive questions. First time using trino but seems more difficult than using presto from EMR. I basically have large amount of data under hdfs in format like /user/hadoop/uuid_1/2024/03/01/01/ many parquet files. It is partitioned by year/month/day/hour, as the directory suggests. I already created hive external table on it and added partitions (presto cli can query it very well). Now I try to use JDBC/trino to query it, but I find that the tables are not set up, so I even dont know how to query it from trino cli.

If I created hive table from hive cli, will that table already show up under trino cli? I can not find it in hive catalog.

So, I create schema like

create schema hive.schemaName with (location='/user/hadoop/');

I notice external_location does not work here, should I use external_location to suggest I dont need to insert data?

Then I run

create table hive.schemaName."uuid_1" (//detailed_schema_omitted) WITH (FORMAT='PARQUET');

But if I do select * from hive.schemaName."uuid_1" limit 1 it does not show anything.

How can I easily solve this?

From my test, whenever I create a table within 1 schema, trino will create a subdirectory under the location of the schema. So, in above example I created table with name hive.schemaName.“uuid_1”, where uuid_1 is already the bucket name with contents.

I assume external_location should be used when creating schema, is that right?

Thanks.

simpligility · April 10, 2024, 4:46pm

Presto and Trino are separate projects and manage metadata in the HMS separately. If you want to access the data in Trino you need to run the statements in Trino, not Presto. When you write the create table statement you need to make sure you get the partitioning, external location and bucketing correct and aligned with whatever the data is using.

Also you might want to look at using system.register_partition