Introducing Enhancements to Starburst Galaxy’s Autoscaler

  • Bo Myers

    Bo Myers

    Senior Product Manager

    Starburst

  • Tyler Shapiro

    Tyler Shapiro

    Product Marketing Manager

    Starburst

Share

Since its launch, Starburst Galaxy has provided customers with the automatic scaling of compute resources, ensuring optimal performance for varying workloads.

Today we launched several enhancements for cluster autoscaling in Starburst Galaxy, now available in private preview. Key metrics now factored into the autoscaling process include:

  • CPU Load
  • Estimated runtime 
  • Queue length 
  • Insights from completed queries

These improvements aim to address a wider range of workloads and provision resources more efficiently, resulting in faster query execution times without increasing costs. In this blog, we’ll delve deeper into these enhancements, and we’ll also review the results of a real-world customer test comparing the legacy and enhanced autoscalers.

A proactive approach to autoscaling 

Previously, if CPU utilization reached or exceeded 60%, the cluster would scale up to its maximum number of allowed workers. However, because automatic resource scaling was triggered solely based on CPU utilization, workloads constrained by other factors didn’t receive additional resources as quickly as needed, if at all.

In the image below, you can visualize the new proactive autoscaling behavior in Galaxy. In the top graph, CPU usage peaks at around 40%. At this peak, you can observe in the bottom graph that additional workers are added without CPU consumption hitting the legacy 60% threshold.

Smarter resource allocation

When workloads require additional capacity but don’t receive it due to constraints on resources other than CPU or delays in activating the additional resources, customers encounter slow query response times or failures. To mitigate these issues, a common practice is to overcommit compute resources, leading to increased costs. 

With the improved autoscaler, Galaxy estimates computation time based on a broader set of metrics, enabling it to cater to a wider range of workloads effectively. Additionally, the autoscaling decision is now made earlier and more quickly, typically within two minutes, compared to at least four minutes previously for large queries. These enhancements guarantee faster query execution and eliminate the need for manual adjustment of resources.

The results are in

In a recent customer test, we evaluated the performance of Starburst Galaxy’s enhanced autoscaler against the legacy autoscaler across various workload sizes. The results speak for themselves. For example, with schema sf10000, query execution times saw a reduction from 5.86 to 4.52 minutes, demonstrating significant improvements in performance. Scaling up to a larger workload with schema sf100000, the enhancements were even more pronounced, with times dropping from 24.35 to 13.80 minutes.

Getting started with autoscaling in Galaxy is as simple as creating your free account today.