How to use Bucketed_by property in a Materialized View

How to use Bucketed_by property in a Materialized View . I know there is an article on this.

But with below SQL DDL Statement does not work

CREATE MATERIALIZED VIEW “jagadishgaglani”.“discovered_schema”.“myjagadishs3bucketmvptn”
COMMENT ‘PRODUCTS with DATE RANGE’
WITH (
partitioning = ARRAY[‘_date’],
bucketed_by = ARRAY[‘products’], bucket_count = 10 ,
storage_schema = ‘discovered_schema’
) AS
SELECT
products
, max(cc_open_date) MAX_CC_OPEN_DATE
FROM
“jagadishgaglani”.“discovered_schema”.“myjagadishs3bucket”
GROUP BY products

Gives me message “Catalog ‘jagadishgaglani’ materialized view property ‘bucketed_by’ does not exist”

I see partitioning in your DDL which tells me you are using either Starburst Galaxy (which creates Iceberg MVs only) or the Starburst Enterprise Iceberg connector. Hive uses partition_by, so clearly an Iceberg MV.

The gotcha you are facing is that Iceberg doesn’t handle bucketing the same way. Take a peek around 37:41 in the video at https://www.starburst.io/resources/hive-to-iceberg-to-migrate-or-not-to-migrate/ to see some more details.

RTM link >> Iceberg connector — Trino 449 Documentation

That doc page shows that Iceberg implements bucketing in the construct of partitioning and leverages a transform function called bucket(). Basically, does the same thing, but creates a folder with the bucket number in it that all bucketed values end up being written into (instead file names with the bucket number).

CHANGE

partitioning = ARRAY[‘_date’],
bucketed_by = ARRAY[‘products’], bucket_count = 10

TO

partitioning = ARRAY['_date', 'bucket(products, 10)']

Am using trial version of Starburst Galaxy - Just to learn about Starburst Product in General

Will try your suggestion and see if it works

Thank you Lester . It worked.

1 Like