Data products in Starburst Galaxy

Data product (galaxy’s new feature) is interesting, how should i use it? It looks like a schema is a product. So why don’t customer directly use schema? How can data products benefit us?

Data products allow data consumers to quickly find and utilize generally available high quality data. The feature enables data teams to take advantage of pre-Galaxy data set curation efforts, as well as take advantage of Trino to create new and innovative federated data sets with limited to no movement of data to quickly provide their organization with these high value data sets.

Currently, a data product is a schema to which business metadata has been added and flagged. Utilization of this feature allows for data teams to direct data consumers first to the Data products view to find a list of data that has been determined to provide business value and has been curated. Unlike navigating through various data sources and their schema, data consumers are able to quickly identify and start utilizing high quality, generally available data from a single pane of glass rather than trudging through your entire data catalog. Users have a consistent way of utilizing the data within Galaxy or with 3rd party clients of their choice. As the number of these types of data sets increase, the time savings for data discovery increases dramatically.

A further benefit of increased discoverability is a reduction in creation of ad hoc data sets. With a single location, data consumers can be made aware that ad hoc data sets and data requests should be created if and only if the existing data products do not meet their needs.

Data products can be created via 2 workflows:

  • Promoting an existing schema to a data product and applying business metadata to it
  • Creating a new data product from a single-source or federated query

The output of both are the same and are transparent to the data consumer.

It should be noted that data product access control utilizes RBAC / ABAC policies of the underlying schema and dataset, allowing ensure proper proper security that can be configured immediately after the creation of the data product.