Solving capacity management problems for Trino clusters, and how Starburst Galaxy makes it easy

Strategy
  • Cole Bowden

    Cole Bowden

    Trino Release Engineer

    Starburst

Share

Solving capacity management issues for Trino clusters is a complicated problem. Although Trino is powerful, using that power effectively often requires a manual understanding of your cluster. To accurately manage your capacity, you first need to understand three things: 

  1. How much capacity you usually need
  2. How much capacity you will need
  3. How much capacity you might need

In effect, this requires an understanding of the past, the present, and several probable futures. Trino scales up to handle massive workloads, and if you know what you’re doing, it can manage pretty much anything you need to throw at it. However, zeroing in on the right capacity for your clusters is a manual, difficult, and complex process, and requires a lot of trial and error. A properly-sized Trino cluster should be large enough to avoid failures or long query queue times, but small enough that you’re not wasting excess cash on capacity you don’t need. 

In this blog, I’ll dive into how to find that goldilocks zone, and explore how Starburst Galaxy and its Icehouse architecture can help solve this problem for you.

Starburst Galaxy makes autoscaling easy

One of the main goals of Starburst Galaxy is to abstract away the difficulties associated with Trino capacity management. By providing autoscaling, auto suspend, and auto shutdown functionality, your Galaxy clusters will scale up to meet capacity when needed, then shrink back down and shut off when they’re not being used. Trino users likely aren’t submitting a workload that remains constant throughout the day, week, or month, and autoscaling adjusts your clusters for this reality by adding more servers and compute power when you need it, but taking it away at non-peak hours when a smaller cluster can manage just fine.

Auto suspend and auto shutdown take this one step further. They make sure that you never have a situation where your provisioned servers sit idle overnight or on holidays when none of your analysts are submitting queries. For large-scale operations, this can lead to massive savings on compute, and for small operations, you don’t need to worry about servers sitting idle because your sole analyst hasn’t had to run any queries in 3 days.

How to properly (manually) size Trino clusters

The “joy” of capacity management and cluster sizing is that every workload is different. In this sense, there’s no single, one-side-fits-all solution because your data and analytics workloads aren’t the same as anyone else’s workloads. Trino is powerful, but managing that power takes time and effort. There’s a bit of a playbook that can help you get started, though, so let’s walk through that.

1) Creating a default cluster

As a starting point, the default for a medium-sized cluster is the following configuration

  • 32 GB of RAM on your coordinator
  • 3-5 workers with 64 GB of RAM

To create this kind of cluster on your own,  follow the Trino documentation for deploying and configuring Trino. Once you’ve got your cluster up and running and connected to your data, you should test that capacity by running complex queries that represent the toughest queries you expect the cluster will need to handle. Include joins among your largest tables, aggregations, window functions, and try to give Trino a workout. This will require some SQL know-how, but if you already have existing analytics workloads, you can inspect the longest-running queries to see how your cluster handles them.

2) Testing capacity management

Testing for optimal capacity management conditions is something you should do iteratively. If Trino is running too slowly or failing on complex queries, adjust your cluster size upwards. If it handles everything like a breeze, adjust it downwards. You can adjust vertically with more or less powerful servers, or you can adjust horizontally by adding or removing worker nodes from your cluster.

With Trino, it’s generally better to have a smaller number of powerful servers with a lot of memory, as every worker node is going to require some memory overhead. A server with 16 GB of memory is wasting a much higher percentage of its capacity on overhead than a server with 128 GB of memory. But depending on cost and the resources you have available, you should do what makes sense for you, make your adjustments, and then run a gauntlet of queries again to check how it runs.

3) Repeat, repeat, repeat

After this, it’s time to try again! Trial and error really is the name of the game.

If you’re adjusting upwards, once things start running smoothly at speeds you’re comfortable with, that’s probably a sign you’ve found a decent capacity. If you’re adjusting downwards, you know what you need once queries start failing. After this, try running a bunch of concurrent queries of various sizes (some large, some small – try to accurately represent a real-world workload). This will test your coordinator’s capacity to handle many users submitting queries at once. If you expect dozens of queries per minute, throw those at the coordinator, see if your coordinator can handle the workloads you’re anticipating, and scale your coordinator server up/down from there, just as you did with your entire cluster of worker nodes.

Repeat this process until you are satisfied.

Common Trino mistakes

Of course, there’s more to configuring and setting up Trino than just finding the right cluster capacity. There’s a lot more that could be said on this front, but as this blog is primarily focused on capacity, I’ll keep it brief and focus on the fundamentals of setting a cluster up: 

  1. Generally, you don’t want to allocate more than 70% of the memory on a server to the Trino JVM, as you want to make sure the operating system and other processes can operate fine without creating any risk of OOM errors. 
  2. You also don’t really need much disk storage on servers. 50 GB to handle the operating system, Trino installation, and logs should be enough. However, if you’re running a cluster with fault-tolerant execution and exchange spooling to your local filesystem enabled, you’ll want a lot of disk storage. We’ll talk more about that in the next section.
  3. You should be willing to configure and change Trino properties to tune the cluster to fit your use case, but don’t go overboard. Many of the defaults are set to that default for a reason. While you shouldn’t be afraid to change values if something seems too high or too low for your liking, if you’re not 100% sure what changing something does, make sure you run tests before and after to see its impact on query performance.

Using a Starburst Galaxy Icehouse architecture

This makes now a good time to emphasize that while running tons of queries and going through the trial and error of sizing your cluster is a lot of work, autoscaling with a Galaxy Icehouse does it for you. When you hire more analysts or your data set grows and your former cluster size isn’t cutting it, instead of needing to re-evaluate and go through the entire process of figuring out how much capacity you need without overspending all over again, Galaxy autoscaling can provision more worker nodes when you need them without you needing to touch a thing. You only get charged for what you use, and you don’t get charged for what you don’t use when you’ve got a week-long company holiday.

Managing multiple Trino clusters yourself

Another common area of difficulty is the management of multiple Trino clusters. It’s becoming increasingly common to run multiple Trino clusters, each with individual responsibilities. A common architecture might include:

  • At least one cluster for analytics
  • Another cluster for fault-tolerant execution and ETL jobs

In practice, multiple analytics clusters may be necessary if you’re operating at a larger scale, and you don’t want long-running complex queries to bully or slow down quick queries that should take seconds. By having more than one cluster, you can route traffic to different clusters to help alleviate this issue.

Using Trino for ETL

As far as those ETL jobs are concerned, Trino has both “standard” and “fault-tolerant” cluster modes that provide different functionality. Standard Trino is the well-known part of the engine, built for ad-hoc analytics and fast SQL analysis at scale. But with fault-tolerant execution enabled, you can instead trade off some performance for improved reliability, and then you can use Trino as an engine to handle batch ingestion of data and ETL workloads. This is a cluster-level configuration option, so you’ll need to decide up front as you provision your cluster whether you want it in standard or fault-tolerant mode. With fault-tolerance enabled, you’ll also need to configure storage for data to spill into during query processing – you can use local disk or cloud storage.

This pattern of using both cluster types works well because instead of needing to grapple with different tools, SQL dialects, and architectures for different jobs, it’s much less effort to setup and maintain multiple Trino instances to do it all for you. No more queries that run fine in Spark but need to be migrated to run in Trino or vice versa – it all works the same, and it makes life easier for your users.

But you’ll still need to set up and configure these clusters, then handle routing between them by also setting up the Trino Gateway, a proxy and load balancer designed for this kind of thing. If you reach a larger scale with 3+ Trino clusters, you can improve reliability of your entire Trino deployment by upgrading clusters one-at-a-time while the others stay online, eliminating downtime while you do that maintenance.

Cluster management with Starburst Galaxy

At the risk of sounding like a broken record, with Galaxy, you don’t have to worry about any of that. You can provision a new cluster with the click of a button, and all you have to do is choose what cluster type. It comes with the features that have already been mentioned before: 

  • Autoscaling
  • Auto-suspend 
  • Auto-shutdown

Configuring fault-tolerant execution is a simple matter of enabling it on the cluster. There’s no need to worry about the Trino Gateway, as your clusters are all accessible from within your Galaxy account, and you can create as many as you need, tailoring them to suit specific workloads.

Starburst Warp Speed

To make things even better, there’s also the option to enable Warp Speed on your clusters, enabling smart indexing and caching. If you have repetitive workloads or particularly popular datasets that get queried again and again, Warp Speed can dramatically speed up querying them, and you can save on compute costs and API calls by going to the less-costly cache.

Divide workloads by cluster 

Because cluster management is such a breeze, you can create many different clusters, tuned to handle specific workloads. A fault-tolerant cluster can remain in the hands of your data engineering team, while a read-only warp speed cluster intended to query a small table with business-critical information can be put into the hands of anyone who needs it for lightning-fast insights. Costs stay low, because any unused clusters suspend or shutdown. The end result is easy customization, configuration, and none of the trial-and-error headache of scaling and maintaining a Trino deployment yourself.

Hopefully this helped shed some light on how to scale and size your Trino clusters for production workloads. Go provision those servers and have fun with the trial and error… or, if you want to make life easy, check out Starburst Galaxy.