Connector documentation says:
You can have as many catalogs as you need, so if you have additional Hive clusters, simply add another properties file to etc/catalog
with a different name, making sure it ends in .properties
. For example, if you name the property file sales.properties
, Trino creates a catalog named sales
using the configured connector.
This is not true by default, all catalogs have the same “hive” name: there are some parameters to set or it’s a bug?
Can you share the following?
- directory listing of
etc/catalog
? e.g.ls -R etc/catalog
- content of the catalog config files you see are showing up as hive
- output of SHOW CATALOGS
I’m unable to reproduce this however having more information would help see what this is about.
I copied below the complete output that you asked for, plus a query from the Postgres DB used by the hive metastore service. As you can see, the catalog name is always “hive” and not “minio” and “tebi” as aspected. So, listing the schemas for “minio” or “tebi” catalog in Trino, shows both schemas, but it’s not true.
alydb.properties jmx.properties memory.properties minio.properties mysql.properties postgresql.properties tebi.properties tpcds.properties tpch.properties
[trino@e94548729edd catalog]$ cat minio.properties
connector.name=hive
hive.metastore.uri=thrift://hive:9083
hive.metastore.username=metastore
hive.s3.aws-access-key=******
hive.s3.aws-secret-key=******
hive.s3.endpoint=http://minio:9009
hive.s3.path-style-access=true
hive.s3select-pushdown.enabled=true
hive.s3.ssl.enabled=true
hive.allow-drop-table=true
hive.max-partitions-per-writers=100
hive.storage-format=PARQUET
[trino@e94548729edd catalog]$ cat tebi.properties
connector.name=hive
hive.metastore.uri=thrift://hive:9083
hive.metastore.username=metastore
hive.s3.aws-access-key=*********
hive.s3.aws-secret-key=**********
hive.s3.endpoint=https://s3.tebi.io
hive.s3.path-style-access=true
hive.s3select-pushdown.enabled=true
hive.s3.ssl.enabled=true
hive.allow-drop-table=true
hive.max-partitions-per-writers=100
hive.storage-format=PARQUET
trino> show catalogs;
Catalog
------------
alydb
jmx
memory
minio
mysql
postgresql
system
tebi
tpcds
tpch
(10 rows)
Query 20220317_142635_00000_9vqde, FINISHED, 1 node
Splits: 7 total, 7 done (100.00%)
1.09 [0 rows, 0B] [0 rows/s, 0B/s]
trino> show schemas from minio;
Schema
--------------------
default
information_schema
tpcds_minio
tpcds_tebi
(4 rows)
Query 20220317_142931_00004_9vqde, FINISHED, 1 node
Splits: 7 total, 7 done (100.00%)
0.22 [4 rows, 66B] [17 rows/s, 296B/s]
trino> show schemas from tebi;
Schema
--------------------
default
information_schema
tpcds_minio
tpcds_tebi
(4 rows)
Query 20220317_142944_00005_9vqde, FINISHED, 1 node
Splits: 7 total, 7 done (100.00%)
0.23 [4 rows, 66B] [17 rows/s, 289B/s]
From postgresql hive metastore:
SELECT "DB_ID", "DB_LOCATION_URI", "NAME", "OWNER_NAME",
"OWNER_TYPE", "CTLG_NAME" FROM public."DBS";
1 file:/user/hive/warehouse default public ROLE hive
3 file:/user/hive/warehouse/tpcds_minio.db tpcds_minio admin USER hive
4 file:/user/hive/warehouse/tpcds_tebi.db tpcds_tebi admin USER hive
Interesting usecase.
It seems you are sharing both the same Hive metastore service and the backing database for both of these catalogs. Only the storage provider/location is different.
I don’t think this is something that can be supported since the Hive metastore service would simply reply with all the schemas and tables it can find in the metastore DB since it doesn’t know that there are logically two different Hive “catalogs”.
I do expect things to work but they mail also fail in surprising ways. Also it seems you cannot provide multiple storage paths (fs.s3a.endpoint
) in metastore-site.xml
of a single HMS instance.
You might want to use two different metastore services, each pointing to a different database for the usecase you are trying to target.