Performance in default configuration

Hello,

I was hoping to test the performance characteristics of Starburst. I ran a simple query on the test data:

select count(*) from tpch.sf100.customer;

And it took 11 seconds on 15m rows. It did not even complete for larger tables like tpch.sf100.orders. This is far too slow for our use case, and I had read that trino/starburst was capable of much faster. Is there a configuration change I can make to speed it up?

Thank you!

hi there! What type of cluster configurations did you use? Also, can you confirm this was starburst enterprise or starburst galaxy? We have multiple ways to tune those clusters to make those results faster. If you want to really query your data lake fast, you should bump up to an accelerated cluster and increase your cluster size. We have people who can help you tune your clusters to your need. send me an email at monica.miller@starburstdata.com and I can find someone to help.

The tpch connector is not suitable to do performance testing on its own. It is a data generator that can be used to create test data in other storage systems. For example you could run CREATE TABLE AS SELECT statements to insert the data into a object storage system that is configured with a catalog that uses the Iceberg or Delta Lake connector.

Once you have that data you want to make sure you have table statistics available and your set up is appropriate for the planned payload in terms of expected users, number of queries, type of queries and so on.
This also means that the queries you run as part of your testing should be realistic for your use cases. Ideally ou actually test with your data as well.

I encourage you to try to implement some of these aspects, ask more questions here, but also reach out to our sales and support team if you are interested to run an actual proof of value project for your company. You will see that with realistic usecases and data Trino and Starburst performance is excellent.

1 Like