Multiple Hive clusters

Capppo · March 15, 2022, 6:11pm

Connector documentation says:
You can have as many catalogs as you need, so if you have additional Hive clusters, simply add another properties file to etc/catalog with a different name, making sure it ends in .properties . For example, if you name the property file sales.properties , Trino creates a catalog named sales using the configured connector.
This is not true by default, all catalogs have the same “hive” name: there are some parameters to set or it’s a bug?

hashhar · March 17, 2022, 11:09am

Can you share the following?

directory listing of etc/catalog? e.g. ls -R etc/catalog
content of the catalog config files you see are showing up as hive
output of SHOW CATALOGS

I’m unable to reproduce this however having more information would help see what this is about.

Capppo · March 17, 2022, 2:49pm

I copied below the complete output that you asked for, plus a query from the Postgres DB used by the hive metastore service. As you can see, the catalog name is always “hive” and not “minio” and “tebi” as aspected. So, listing the schemas for “minio” or “tebi” catalog in Trino, shows both schemas, but it’s not true.

alydb.properties  jmx.properties  memory.properties  minio.properties  mysql.properties  postgresql.properties  tebi.properties  tpcds.properties  tpch.properties

[trino@e94548729edd catalog]$ cat minio.properties
connector.name=hive
hive.metastore.uri=thrift://hive:9083
hive.metastore.username=metastore
hive.s3.aws-access-key=******
hive.s3.aws-secret-key=******
hive.s3.endpoint=http://minio:9009
hive.s3.path-style-access=true
hive.s3select-pushdown.enabled=true
hive.s3.ssl.enabled=true
hive.allow-drop-table=true
hive.max-partitions-per-writers=100
hive.storage-format=PARQUET

[trino@e94548729edd catalog]$ cat tebi.properties
connector.name=hive
hive.metastore.uri=thrift://hive:9083
hive.metastore.username=metastore
hive.s3.aws-access-key=*********
hive.s3.aws-secret-key=**********
hive.s3.endpoint=https://s3.tebi.io
hive.s3.path-style-access=true
hive.s3select-pushdown.enabled=true
hive.s3.ssl.enabled=true
hive.allow-drop-table=true
hive.max-partitions-per-writers=100
hive.storage-format=PARQUET

trino> show catalogs;
  Catalog   
------------
 alydb      
 jmx        
 memory     
 minio      
 mysql      
 postgresql 
 system     
 tebi       
 tpcds      
 tpch       
(10 rows)

Query 20220317_142635_00000_9vqde, FINISHED, 1 node
Splits: 7 total, 7 done (100.00%)
1.09 [0 rows, 0B] [0 rows/s, 0B/s]


trino> show schemas from minio;
       Schema       
--------------------
 default            
 information_schema 
 tpcds_minio        
 tpcds_tebi         
(4 rows)

Query 20220317_142931_00004_9vqde, FINISHED, 1 node
Splits: 7 total, 7 done (100.00%)
0.22 [4 rows, 66B] [17 rows/s, 296B/s]

trino> show schemas from tebi;
       Schema       
--------------------
 default            
 information_schema 
 tpcds_minio        
 tpcds_tebi         
(4 rows)

Query 20220317_142944_00005_9vqde, FINISHED, 1 node
Splits: 7 total, 7 done (100.00%)
0.23 [4 rows, 66B] [17 rows/s, 289B/s]


From postgresql hive metastore:

SELECT  "DB_ID", "DB_LOCATION_URI", "NAME",  "OWNER_NAME",
 "OWNER_TYPE", "CTLG_NAME" FROM public."DBS";


1	file:/user/hive/warehouse	                default 	public	ROLE	hive
3	file:/user/hive/warehouse/tpcds_minio.db	tpcds_minio	admin	USER	hive
4	file:/user/hive/warehouse/tpcds_tebi.db	    tpcds_tebi	admin	USER	hive

hashhar · March 17, 2022, 6:47pm

Interesting usecase.

It seems you are sharing both the same Hive metastore service and the backing database for both of these catalogs. Only the storage provider/location is different.

I don’t think this is something that can be supported since the Hive metastore service would simply reply with all the schemas and tables it can find in the metastore DB since it doesn’t know that there are logically two different Hive “catalogs”.

I do expect things to work but they mail also fail in surprising ways. Also it seems you cannot provide multiple storage paths (fs.s3a.endpoint) in metastore-site.xml of a single HMS instance.

You might want to use two different metastore services, each pointing to a different database for the usecase you are trying to target.