Do I need to set up Hive in order for the Hive connector to work?

system · October 12, 2021, 10:28pm

bitsondatadev · October 12, 2021, 10:29pm

No, you do not need to set up Hive as with other connectors to use it. You do need to either set up the Hive Metastore or replace that with AWS Glue, both of which are compatible with the hive metadata model. You can also connect to cloud storage and don’t have to limit yourself to using HDFS as the data storage layer. In essence, the Hive connector is what you use in Trino for reading data from object storage that is organized according to the rules laid out by Hive, without using the Hive runtime code.

To find out more read this blog:

Capppo · March 16, 2022, 8:36am

Hi, great tutorial, everything works fine!
I have tried to replicate the experience using Tebi (https://tebi.io/) instead of Minio, with a single bucket named “aly01”, but with this ddl:

CREATE SCHEMA tebi.tebitpcds
WITH (location = 's3a://aly01/tpcds.db') ;

I get “Query 20220316_082507_00001_mzfi2 failed: Got exception: java.io.FileNotFoundException Bucket aly01 does not exist”
Obviously, I’m able to access the same bucket with the same credentials from Dremio or Denodo.
This is my tebi.properties

connector.name=hive
hive.metastore.uri=thrift://<hive-metastore>:<port>
hive.metastore.username=<metastore-username>
hive.s3.aws-access-key=<tebi-key>
hive.s3.aws-secret-key=<tebi-secret>
hive.s3.endpoint=https://s3.tebi.io
hive.s3.path-style-access=true
hive.s3select-pushdown.enabled=true
hive.s3.ssl.enabled=true
hive.allow-drop-table=true
hive.max-partitions-per-writers=100
hive.storage-format=PARQUET

Some suggestion? Thanks.

bitsondatadev · March 17, 2022, 12:00am

Hi @Capppo,

Just to be sure, you did create the aly01 bucket in Tebi before running the query? Also make sure any auth required by Tebi is satisfied and permissions are set correctly.

Capppo · March 17, 2022, 6:02pm

Yes, of course, but in the meantime, I found where is the problem.
The S3 credentials and endpoints are also defined in the “metastore-site.xml” file of Hive Metastore Service (HMS) because is the location where HMS saves the local tables and, also, checks that the given bucket will be present when a new schema is defined.
For all these (logical) reasons, It’s not possible to use the same HMS with different S3 providers (or at least I don’t found how …).
Indeed I started 3 different HMS containers, configured with 3 different S3 providers (minio, scaleway and tebi) and using the same instance of Postgresql but with 3 different metastore db, and everything works fine.