Today, we announced a range of new capabilities to further lead with best-in-class price-performance for lakehouse SQL analytics in our Trino-based open hybrid lakehouse platform, Starburst Galaxy.
Just like in Formula 1 (F1) racing, every enhancement that simplifies operations and improves the car’s speed and efficiency is critical to improving the chances of winning. As the focus shifts to using the best SQL engine for the job, it’s even more important for Lakehouse platforms to push similar performance and efficiency limits as F1 teams for modern analytics.
We are excited to announce three key performance enhancements: the general availability of enhanced autoscaling and the private preview of both Next Gen Caching and User Role Based Routing.
Enhanced autoscaling
Since its launch, Starburst Galaxy has provided customers with the automatic scaling of compute resources, ensuring optimal performance for varying workloads. Today, we have launched several enhancements for cluster autoscaling in Starburst Galaxy, which are generally available.
Previously, a cluster would Autoscale up to the maximum number of workers if CPU utilization met or exceeded 60%. This approach was limited because, in some cases, scaling wouldn’t happen at all if the overall CPU utilization was low. The new logic considers CPU load and estimated CPU time needed for current and pending queries. This means that our autoscaler now looks at not only current work but also planned work.
We’ve also added the ability to reactivate draining worker nodes so that a worker in the Shutting Down phase can immediately join the cluster as an Active worker. This helps customers better monitor their billing and usage and balances aggressive scaling and price performance.
The new auto-scaling behavior is illustrated here in Galaxy:
- CPU usage peaks at around 40% in this example
- You can see that additional workers are added without the CPU consumption hitting the legacy 60% threshold
The new auto scaler is generally available for all Starburst Galaxy users and applies to Standard, FTE, and Accelerated execution modes.
Next Gen Caching
We are also excited to introduce the private preview of Next Gen Caching, which results from Warp Speed’s deeper integration into the Trino engine. This allows the system to cache intermediate subquery results directly on SSD storage. While Warp Speed previously focused on indexing and caching raw data from object storage to improve data access performance, Next Gen Caching focuses on caching subquery results. This eliminates the need to recompute complex subqueries across recurring or similar queries, significantly reducing processing time.
By leveraging this deeper integration, Warp Speed not only enhances performance but also removes many of the SQL limitations that existed before. With efficient handling of intermediate results, Warp Speed can now process more complex query patterns without hitting traditional bottlenecks.
Next Gen Caching is especially well-suited for queries generated by semantic layers in BI environments and DBT. Semantic layer queries often follow repeatable SQL patterns, where subqueries serve as a baseline, with specific business metrics and filters applied on top. This repetition makes these subqueries perfect candidates for caching.
*based on TPDS performance testing
By automatically persisting these intermediate subquery results on SSDs, Next Gen Caching drastically improves the performance of BI-driven queries. Since these cached subqueries can be reused across different reports and dashboards, query latency is significantly reduced, leading to faster and more responsive analytics.
To join the private preview, email Guy Raz for availability.
User Role Based Routing
Lastly, we are announcing the private preview of User Role Based Routing. This feature streamlines query routing based on the user’s role in the organization. Before, In Galaxy, it was up to the end user to ensure queries were routed to the correct clusters. This approach was error-prone and required lots of effort on behalf of our users.
Using routing, a single endpoint per Galaxy account is added to dispatch queries. Queries will initially be routed to the dispatcher that is geographically closest to the user. Queries are evaluated against the user-defined Routing Rules and dispatched to the cluster from the first rule that matches.
With User Role Based Routing, users can send all queries to a single URL, which will route the queries based on the User’s Role. This satisfies the use case where an organization wishes to route query traffic to the appropriate cluster catalog for its different teams of data analysts. This enhancement minimizes human intervention, delivers more guaranteed price-performance and a more seamless experience for end users, and lays the foundation for us to build on a suite of smart routing use cases moving forward.
To join the private preview, email Bo Myers for availability.
Conclusion
These enhancements drastically improve the performance, scalability, and simplicity of running powerful analytics across all your data in and around your lake with Starburst Galaxy. Starburst already offers the leading SQL engine for Apache Iceberg; we are excited to see how our customers push the limits of their lakehouse analytics with today’s announcements. To experience the benefits of Starburst Galaxy, start your free trial today.