Introducing support for UNLOAD in Starburst Galaxy

Streamline data management through seamless file writing without table creation in Starburst Galaxy

April 8, 2024

Ahmed Niyaz
Product Manager
Starburst
Yuya Ebihara
Software Engineer
Starburst

Ahmed Niyaz
Product Manager
Starburst
Yuya Ebihara
Software Engineer
Starburst

More deployment options

Request Enterprise trial license key →

In today’s data-driven landscape, data professionals are tasked with efficiently managing and processing vast amounts of data. One recurring challenge in this process involves the constraints around outputting query results in formats that seamlessly integrate with downstream applications.

Organizations often grapple with inefficient mechanisms for compressing these query results, leading to inflated storage costs and compromised performance in data processing pipelines. Similarly, data science teams encounter complexities in directly feeding query outputs into machine learning models without time-consuming table creation processes.

Tackling these obstacles head-on, Starburst is excited to introduce the UNLOAD table function—a transformative solution offering flexibility in output formats, compression options, and direct data feeds for downstream consumption.

UNLOAD table function

The UNLOAD table function is a pre-built table function within the system schema, designed to efficiently handle data output operations. By default, the function is deactivated through access control, requiring the assignment of necessary privileges to execute the UNLOAD function and granting location privileges for the output destination.

The input parameter can accept either a table name or a SELECT query. The separator parameter is applicable only when the format argument is set to CSV or TEXTFILE.

See documentation for further details.

SELECT * FROM TABLE(system.unload(
input => TABLE(...) [PARTITION BY col (, ...)], 
location => '?', 
format => '?'
[, compression => '?']
[, separator => '?']
[, header => true|false]
))

Consider writing files as text file format with `|` delimiter and gzip compressed:

This query generates a list of file paths, row counts, and partition columns. The PARTITION BY clause influences the directory structure of the results. When omitted, the function generates files within the designated directory without creating subdirectories. In this example, the file structure mirrors the format of a Hive table, not Iceberg or Delta Lake.

Decompressing and Examining Results

Let’s decompress a gzip file and examine the results under the orderdate=1992-01-11/orderstatus=F partition. The partition names and values are exclusively present in the directory names. The generated files do not contain the partition names and values.

Limitations

The UNLOAD table function allows writing in CSV or TEXTFILE formats, directly to all supported storages such as AWS S3, Azure Data Lake Storage, and Google Cloud Storage. It’s important to note that we are currently in the experimental stage with this functionality. Presently, the function produces a solitary file for either an entire table or a specific partition and does not accommodate batch cluster types. Each format has its own set of constraints. For example, the CSV format exclusively supports VARCHAR columns, and Avro files do not permit special characters in the column names.

To learn more about the UNLOAD table function, see our documentation.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.

Starburst’s mission is to free our customers to see the invisible and achieve the impossible