Avoid one bad query from hogging all trino resources

Dear Trino Community,
We have a Trino cluster with 5 nodes each with 11.7GB heap memory and 7 cores. Now that we have more people using it, we would like to ensure that resources are shared fairly and that one bad query cannot hog all resources and not let anything else run in parallel.
Resource group restrictions only work on assignment so if a bad query is running all restrictions are applied to the following queries, but not the one that is causing the problem.
The bottle neck seems to be that almost any query can utilise the CPU to 100% hence not leaving enough for other queries to start.
In order to deal with this we have reduced the task.max-worker-threads, which seems to help a bit and the CPU jumps from 100% to 50% leaving some room for other queries to grab resources.
Does anyone here happen to have other recommendations how we could make our environment a bit more fair?
Thank you,
Magda

Usually, query.max-memory-per-node, Resource management properties — Trino 422 Documentation, helps a lot as the default setting doesn’t let a single query utilize more than a third of a given worker’s memory at any time. Might be worth a check to see what that property is set to.

In a k8s install of Starburst Enterprise, there’s a maxConcurrentQueries property (defaults to 3) that is used to set query.max-memory-per-node to the following.

[(jvmHeapSize - jvmHeapHeadroom) / maxConcurrentQueries]
1 Like