Are there any good OSS cluster analysis tools?

We write a lot of tools (micro-services, react UIs, workload analyzers, etc) in order to understand our clusters. Some of this is about understanding performance, but we also go to great lengths to track individual user activity.

E.g. “at 3:12PM we spiked to 7,000 cpus and had 36 active users, but this one user in particular was doing something bad and causing poor behavior by using almost infinite memory, let’s lock them out to keep things stable”.

Do any open source options exist outside of core trino to help with this? I would prefer to avoid putting further effort into one-off solutions if there are community solutions.

I have not tried this myself but you can check these resources

https://varada.io/blog/presto/presto-workload-analyzer-tips-ib/

1 Like