The new Starburst Enterprise 423-e LTS release provides Starburst customers with exciting new capabilities alongside more advanced connectivity, improved performance, and enhanced security. As always, this major release combines features that have been contributed back to the open source Trino project, as well as new features created for Starburst Enterprise customers. The quarterly LTS release is the best opportunity for existing customers to upgrade their cluster and take advantage of all the new enhancements, and for new prospects to start their journey with Starburst Enterprise.
Lakehouse
Starburst Enterprise 423-e LTS introduces noteworthy features centered around Apache Iceberg, Delta Lake, and Hive, offering users enhanced capabilities and streamlined data management. For Iceberg, this new release extends the existing migrate() table procedure to now support external tables, facilitating a seamless transition between data lakes and warehouses. The inclusion of support for Nessie catalogs enhances version control capabilities, offering users advanced data lake management options.
Turning to Delta Lake, the 423-e LTS includes some great new enhancements. The new table_changes() function enables users to delve into row-level changes between different versions of Delta Lake tables, elevating data lineage tracking. The capability to perform INSERT, UPDATE, DELETE, and MERGE operations with name and id column mapping empowers users with versatile data manipulation tools. The addition of dereference pushdown optimizes nested field access, significantly boosting query performance and refining data exploration.
In the Hive ecosystem, Starburst Enterprise 423-e LTS offers a range of additional improvements. Users can now create custom table properties beyond the predefined ones, providing greater flexibility in data organization. Faster queries against the new, built-in information_schema.tables swiftly retrieve names of tables and views across schemas, enhancing accessibility to metadata.
Performance
The latest iteration of Starburst Enterprise 423-e LTS introduces a range of performance-enhancing features designed to optimize data processing workflows. Among the highlights is the integration of bloom filter index statistics in Parquet for Iceberg, building on the capability initially introduced to the Hive Connector in a previous release. This addition enables Iceberg to leverage advanced index statistics, resulting in more efficient data retrieval and query execution. The Parquet writer also received an upgrade, now organizing columns based on their size within row groups. This strategic arrangement empowers the reader to retrieve smaller columns with fewer filesystem requests, significantly boosting query performance and reducing data access overhead.
In terms of platform support, Starburst Warp Speed now extends its compatibility to M6gd instances, offering cost-effective alternatives that are Graviton2-based. This expanded support provides users with more options to choose from based on their performance and cost requirements. Additionally, the release adds support for materialized views on Iceberg, further enriching the analytical capabilities of the platform.
Security
The integration of SCIM (System for Cross-domain Identity Management) support, now in public preview for Azure AD and Okta in Starburst Enterprise, introduces a crucial security feature that enhances user access management and authorization processes. By leveraging the SCIM protocol, the platform can seamlessly synchronize users and groups from external identity providers, streamlining the onboarding process and ensuring consistent access control across your infrastructure and within Starburst Enterprise. This, in turn, lays the foundation for critical downstream capabilities like enforcing granular access control rules for these users and groups on your data sources. SCIM synchronization in Starburst Enterprise not only streamlines access provisioning but also contributes to a more robust and efficient data security framework.
Furthermore, the 423-e LTS release introduces critical security updates related to AWS Lake Formation (in public preview). With the addition of DDL and DML write support for schemas, tables, and views, customers gain the ability to perform essential operations such as creating, altering, dropping, inserting, deleting, and updating data structures and records. The official support for Lake Formation-Tag policies on both read and write operations adds an extra layer of security for S3 buckets, catering to a broader range of use cases with granular control over data access and modification. Additionally, the inclusion of resource link support (read only) contributes to comprehensive security measures by providing customers with enhanced control over their data and resources.
Connectivity
Managed statistics
Initially introduced in our 407-e LTS, managed statistics has proven to be a pivotal feature within Starburst Enterprise. This innovation facilitates the collection and retention of essential table and column statistics, particularly for data sources with limited or absent native statistical capabilities. The availability of these statistics empowers the cost-based optimizer, enabling it to craft more informed query plans, making for expedited query performance with these data sources.
In our ongoing commitment to enhancing performance, we’ve expanded the reach of managed statistics to a broader range of connectors. With the recent 423-e LTS release, we’re excited to announce that managed statistics are now available as a public preview for the following connectors: Greenplum, DB2, Netezza, Redshift, Synapse, and Stargate.
For further insights into managed statistics, we encourage you to explore our comprehensive documentation or watch in action in our recent webinar.
Parallel Snowflake connector
The new Parallel Snowflake connector offers a host of compelling benefits for users seeking seamless and efficient cross-system data operations. With the ability to query and create tables within an external Snowflake database, this connector facilitates data integration across diverse systems, such as Snowflake and Hive, or even between various instances of Snowflake itself. Notably, the parallel Snowflake connector stands out for its exceptional speed and performance. As part of our ongoing commitment to advancing connectivity, this connector is recommended for Snowflake connectivity with considerable performance improvement over the JDBC and Distributed connectors. Based on internal benchmarks, the parallel Snowflake connector demonstrates a remarkable 1223% faster performance than JDBC and 60% faster performance than the distributed connector on select queries.
Storage
The Starburst Enterprise 423-e LTS release brings significant advancements in storage capabilities, catering to diverse needs and enabling seamless data management across various platforms. The introduction of support for Ceph storage, Dell ECS, and ObjectScale as object storage backends offers enhanced flexibility and options for users. Notably, Ceph emerges as a robust storage platform supported by Starburst Enterprise for connectors like Hive, Iceberg, and Delta Lake. This expanded storage support empowers users with the ability to efficiently manage and access data from different sources, formats, and platforms.
The collaboration between Dell Technologies and Starburst further elevates the multi-cloud data analytics solution, simplifying the consumption of data by abstracting complexities and enabling the deployment of a thoroughly tested architecture. This integration not only facilitates intelligent data movement based on usage patterns but also empowers users to prevent vendor lock-in and scale efficiently by separating compute and storage.
Moreover, the addition of Dell ECS and ObjectScale as supported storage platforms offers users more avenues to harness the power of Starburst Enterprise. This empowers data teams to swiftly deploy tested architectures on-premises, optimize data movement based on usage, and take advantage of the latest Dell ECS innovations. Ultimately, these advancements in storage support bolster the overall data management landscape, providing users with the tools needed to simplify data consumption, enhance governance, and unlock the full potential of their data assets across the enterprise.
And more!
The 423-e LTS includes all of the previously-mentioned enterprise enhancements and more, as well as all improvements added to the open-source Trino project up to Trino 423. To read the full list of enhancements and changes in this release, see the 423-e LTS release notes.