Introducing Autoscaling in Starburst Galaxy

  • Jack Klamer

    Jack Klamer

    Software Engineer

    Starburst

Share

Have you ever gotten lost in Home Depot? Be honest. You pass mindlessly through the rows of 30 ft high shelves which contain a myriad of tools that your college degree taught you absolutely nothing about. Let me be the first to admit that I have – sorry to publicly shame you Dad. But, in my defense, there are a lot of options to choose from and it feels nearly impossible to tell which exact size tool you’ll need ahead of time. And, in most cases, there are multiple gadgets that will do the exact same trick. Yes, you can finish putting that IKEA furniture together with what you’ve gotten in the box, but sometimes it’s faster (and a lot more fun) to trade the Allen wrench for a power drill and see where the night takes you.

As with regular tools, the same concept applies to digital tools. Who among us hasn’t contemplated hosting the ole personal blog on Kubernetes? However, I think we can all agree that most of our content does not warrant this implementation. And as anyone who has ever tried to pay to host a whole cluster themselves knows, it is often prudent to think carefully about appropriately sizing the tool to the need. Just like with regular tooling, this selection is difficult to know ahead of time and sometimes impossible to determine! One of the most attractive parts of a tool like Kubernetes is that it allows you to autoscale your digital assets to fit the job at hand. While that use case may not be necessary to host the crowd of people who arrive at your personal blog (after getting lost on the way to Stack Overflow, of course), we at Starburst believe that autoscaling with Kubernetes is one key to establishing a flexible technical ecosystem. Therefore, we are excited to discuss the implementation of cluster autoscaling into our cloud-based digital tool, Starburst Galaxy.

Motivations

Here at Starburst, we’re in the business of creating digital tools that sit on the metaphorical shelf that is the modern data ecosystem. Building on Trino, a best in class query engine, we have a committed team working hard to provide you with the best analytics solution possible and to help you implement this solution either on-premise or in the cloud. As part of our fully integrated cloud solution, Starburst Galaxy, we’ve created a way to bring you even more customization by enabling cluster autoscaling. We now have the functionality to dynamically and automatically size this digital tool to fit the needs of your active SQL queries. This new feature will make an already flexible tool even more so, and allow the cluster to grow or shrink to the necessary size to accommodate whatever jobs are running.

Because almost no two Trino runtime environments are the same, it’s important that we give you, the customers, even more control over right sizing this digital tool to fit your individual needs. Along with cluster autoscaling we’re also rolling out the ability to create and manage your own cluster configurations so that you’re not spending a dime more than you want or waiting longer than you want for your jobs to finish.

Bells and Whistles

By reducing the overhead of actively managing a complex system, you can consciously choose to meet your own personalized needs, such as minimize cost or extensively scale your running jobs. With the addition of cluster autoscaling, account admins now have the ability to create customized infrastructure for their team in under a minute by selecting a range of workers for each created cluster configuration and assigning that cluster configuration to the corresponding clusters.

Starburst Galaxy Autoscaling

While we, the Starburst engineering team, have determined the ideal wait time for each query to finish is the amount of time it takes one to go get another coffee at the latest, it is important that account admins have the option to determine this themselves by either scaling up for a faster runtime or scaling down for cost savings. In general, we suggest creating your cluster range by selecting as few machines as you need to run your biggest queries at the low end of the range, and as much as you’re willing to pay for speed at the high end of your range.

Like other features in Galaxy, we’ve designed cluster autoscaling to be simple, effective, and technically sound. Under the hood we are using Kubernetes, a much better use for it than hosting your personal blog, which allows us to implement this additional functionality in all of the major cloud providers. In addition, changing your cluster configurations does not need to force an update anymore. When it’s possible to get Kubernetes to make your cluster the right size without disruption or the added cost of a blue-green deployment, we’ll make it happen. Add cluster autoscaling to your clusters and tell your superior that you increased data analysis capabilities or decreased cost, and don’t forget to mention that it all occured in one day without any downtime! We won’t snitch, these updates were a team effort.

Cluster autoscaling works hand-in-hand with our auto-suspend feature that also helps create cost savings by suspending an unused cluster after no activity occurs for the set amount of time provided by the user. Together, these two features can work together to generate new cost savings for each customer.  Creating appropriate cluster configurations to engage the power of autoscaling, such as running a smaller cluster that grows with increased usage, has the potential to be far more cost effective as opposed to running large clusters that stand at the ready.

Looking Toward the Future

Enabling cluster autoscaling opens up new possibilities to further enhance Starburst Galaxy as a product. In the near future, we hope to transition customers to use an autoscaling cluster that never suspends and remains at the lowest cost of ownership for an always available cluster. We also have an exciting opportunity to potentially become proactive with our scaling choice for clusters using the new Trino fault tolerant execution. This mode allows us to observe the query plan in action and shape the cluster to the resources needed. This is just the beginning of how we think about autoscaling as automation based around your priorities.

Lastly, looking toward the future: we have absolutely no idea. At Starburst, we’re product led and on a mission, which means we will cast this feature into the sea if we think we can find a better way to get you the best data tooling for your job. But, while there are data tasks of every shape and size, and data teams with different priorities, there will be a need to find the right tool for the job. And ultimately we’re trying to do what Home Depot can’t: make that easy.

Experiment with cluster autoscaling and Starburst Galaxy today by signing up for a free trial and claiming your $500 dollars of free credits.