What’s the difference between location and external_location?

When creating a Hive table in Trino, what is the difference between external_location and location . If I have to create external table I have to use external_location right? What is the difference between these two?
According to this example Hive connector — Trino 422 Documentation

Trying to decide between these two when setting up INSERT OVERWRITE here: Can I INSERT OVERWRITE data on an unpartitioned table in Trino?

Tables created with location are managed tables. You have full control over them from their creation to modification.
tables created with external_location are tables created by 3’d party systems. We just access them mostly for read.I would encourage to use location in your case.

Great. Just one more concern/question. The reason I wanted to create external table is to persist data in HDFS.

  1. So if I create managed table and execute DROP it will remove all the inserted data right?
  2. If I create table using location it will create managed table, can I still store it as csv format or does it have to be ORC?

The reason for creating external table is to persist data in HDFS.

The reason for creating external table is to persist data in HDFS.

This is just dependent on location url.

  • hdfs:// - will access configured HDFS
  • s3a:// - will access comfigured S3

etc,
So in both cases external_location and location you can used any of those. It’s just a matter if Trino manages this data or external system.

So in you case you would use:
location='hdfs://hdfs-hostname:8020/user/hive/cards_ext'

Got it. One question which remained was if I create with location and execute drop command on that table it will remove the data right? As the table is managed for trino? And the table format can be anything right? CSV, JSON etc…

CREATE TABLE cards_ext ( 
    id varchar,                       
    uid varchar,                         
    credit_card_number varchar,       
    credit_card_expiry_date varchar,     
    credit_card_type varchar,
    dt varchar
 )
WITH(
FORMAT='csv',
partitioned_by= ARRAY['dt'],
location='hdfs://hdfs-hostname:8020/user/hive/cards_ext'
);

Yes you can use any format supported with Hive connector and DROP will remove physicaly data from HDFS.

Got it :pray: . Thank you so much. I will use location and try again.

I just tried creating with location property and butI get the error

Query 20211105_105949_00319_2mbb4 failed: Catalog 'hive' does not support table property 'location'

How you created schema?

Please look at this test

It’s using managed table. You need to create schema with location and then use create table without any location property.

1 Like

Thanks I will try again and will let you know. Thanks for the follow up really appreciate :pray: . I really want to use Trino for data ingestion so looking forward to getting help :smile:

Hi @aakashnand

Were you able to achieve the external table creation from Trino for Hive? I am also kind of stuck at this and wanted to check if you had any breakthrough.

Thanks in advance !!

Vijay