Benchmark testing is a valuable tool for evaluating and optimizing the performance of systems and components to meet the desired performance requirements.
SQL benchmark testing involves measuring the performance of a database system. The purpose of SQL benchmark testing is to evaluate the database's performance in terms of speed, scalability, and reliability. This type of testing can help identify bottlenecks, optimize database configurations, and make informed decisions about hardware and software upgrades.
SQL benchmark testing typically involves running a series of queries or transactions against the database and measuring key performance metrics, such as response time, throughput, and resource utilization. These tests can be performed using standardized benchmarks, such as TPC-DS or TPC-H, or custom benchmarks designed to simulate specific real-world workload patterns.
The results of SQL benchmark testing can help database administrators and developers optimize database performance by tuning parameters such as indexes, query optimization, and hardware configurations. Additionally, benchmark testing can be used to compare the cost and performance of different database systems or versions to determine which one best meets the needs of a particular application.
It's vital to understand that evaluating performance and cost in isolation is insufficient. While the speed of raw query execution is significant, it must be balanced against the associated expenses.
For instance, consider running tests on two different architectures or platforms that seemingly utilize the same resources. Test A completes in 60 minutes, while Test B finishes in 54 minutes, making it 10% faster than Test A. Initially, one might lean towards favoring the setup (and possibly vendor) used for Test B. However, what if Test A costs $50 and Test B costs $100? This scenario prompts a reassessment.
Conducting a third test involves doubling the size of the architecture used in Test A to align both tests based on cost. Surprisingly, Test A now completes in 30 minutes for $100. Consequently, the conclusion shifts, indicating that the solution employed in Test A is actually the superior choice in terms of value for money.
Further optimization can be explored by scaling Test A up to 150% of its initial size, resulting in a cost of $75 and a runtime of 45 minutes, showcasing improved performance compared to Test B at a reduced cost.
It's important to note that performance doesn't always scale linearly, as demonstrated in the example above. This underscores the critical importance of benchmark testing, as understanding the actual cost and performance of a solution significantly impacts your business's bottom line.
Beginning your benchmark testing with a standard test harness like TPC-DS isn't necessarily a misguided approach. However, it's unlikely that your own environment will mirror a standard benchmark precisely. More often than not, when a solution excelled with a standard test harness like TPC-DS, it underperformed when tested against real-world data sources and SQL, often at a lower cost.
Hence, it's advisable to initiate testing with your own data sources and SQL queries. Alternatively, at the very least, ensure that your testing doesn't conclude after solely relying on a standardized benchmark like TPC-DS.
In this tutorial, the JMX file for the final read test can be easily customized by substituting the TPC-DS SQL with your own SQL queries that utilize tables from your own data sources. This approach eliminates the preparatory work required to deploy the TPC-DS tables. Should you opt for this method, simply bypass steps 1 - 3 outlined in the "Run JMX scripts" section, which involve creating, populating, and optimizing the TPC-DS tables.
If you're simply conducting testing to acquaint yourself with the process, you don't necessarily need to adhere strictly to all the best practices outlined below. These practices are specifically recommended for those conducting benchmark testing as part of an evaluation for a new business solution.
This underscores the importance of balancing cost and performance. If one vendor can't complete a test with the same compute resources as another but offers half the cost, it's advisable to scale up the environment of the less expensive vendor to match costs and observe the outcomes. Ultimately, what matters is the cost-effectiveness of completing the SQL workload within the required timeframe.
This tutorial provides all of the required code as a series of JMX files. To use it, you will need to edit the code to make it unique to your environment.
Once you've completed this tutorial, you will be able to:
Your first task is to install JMeter on your machine. You'll also need to download the Trino JDBC driver and move its associated files to the JMeter /lib
directory.
You can download the binaries from the Apache JMeter website.
The Trino JDBC driver is required to allow communication between JMeter and your Starburst Galaxy clusters.
*.jar
file to the JMeter /lib
directory.Now that you have JMeter installed and ready to go, it's time to set up your Starburst Galaxy environment. To prepare for testing, you'll need to create two clusters, one with fault-tolerant execution mode enabled and one with Warp Speed enabled. This section walks through the steps to configure a fault-tolerant execution mode cluster, which will be used to create tables during testing.
Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.
Starburst Galaxy separates users by role. Your current role is listed in the top right-hand corner of the screen.
Creating a cluster in Starburst Galaxy will require access to a role with appropriate privileges. Today, you'll be using the accountadmin role.
New clusters can be created from the Clusters section of Starburst Galaxy.
You're going to start by giving the new cluster a name. This should be something descriptive of its use and location.
jmeter-testing
, but you are free to choose a different name if you'd like.Now it's time to choose which catalogs to include in the cluster. For now, you only need to add the tpcds
catalog. You'll also need to choose the Cloud provider region.
tpcds
. Next, you'll need to add the remaining cluster details to set up the new cluster. We recommend using the following configuration.
The second cluster you create will be used to run the benchmark tests. You'll find that the configuration is almost identical to the steps you just completed, with the exception of the Execution mode setting.
Again, the cluster name should be something descriptive of its use and location.
jmeter-testing-warpspeed
, but you are free to choose a different name if you'd like.Now, it's time to choose which catalogs (data sources) to include in the cluster. For now, you only need to add the tpcds
catalog. You also need to choose the Cloud provider region.
tpcds
. Next, you'll need to add the remaining cluster details to set up the new cluster. We recommend using the following configuration.
We will have you deploy an Accelerated cluster which allows the test to benefit from Smart Index and Caching. Today most vendors have some type of SSD caching enabled by default which cannot be disabled.
Now it's time to configure a catalog in Starburst Galaxy that connects to your Amazon S3 bucket. As part of the benchmark testing process, the scripts you run will create and populate tables within this catalog.
Create a new catalog for your Amazon S3 data source.
Starburst Galaxy allows the creation of catalogs for a number of different data sources. In this case, you are going to create a new catalog in the Amazon S3 category.
We recommend using the name benchmark
for your catalog. This catalog name is hard coded into the provided JMX files, and you will have to edit them if you use a different name.
benchmark
. Starburst Galaxy allows you to configure several different authentication methods when creating a new catalog. For this tutorial, we recommend using a cross account IAM role which is considered a best practice by AWS.
Starburst Galaxy provides three metastore options for Amazon S3 catalogs (Starburst Galaxy, Amazon Glue, and Hive). For this tutorial, we will use the Starburst Galaxy metastore, as it removes the burden of configuring and managing a separate metastore service. However, if you prefer to use Glue, you may do so.
Note: The scripts in this tutorial will create external, or unmanaged, tables, so the default directory will not be used.
Starburst recommends setting Apache Iceberg as the default table format to ensure that all tables will use the Iceberg format when you create them without specifying the format.
You're almost there! Time to test the connection and then complete the process of creating your new Amazon S3 catalog.
You can leave the default access controls.
You need to add your catalog to both clusters you created earlier in this tutorial.
There are four JMX files for you to download, provided in a zipped folder here. The first three files will create, populate, and optimize tables in Starburst Galaxy, while the fourth will run the benchmark tests.
The tables created by the JMX scripts are pulling data from the tpcds
catalog. We chose to create our own tables rather than querying the tpcds
catalog directly because it is bad practice to query a data generator. The results will not be valid if you do.
As you look through the files, you may notice that there are several CAST
operations. This is because our data generator for tpcds
was built for the Hive table format, and we are using the Iceberg table format for this test. The CAST
operations will ensure that Iceberg-supported data types are used.
Now, it's time to edit the provided files to make them unique to your environment.
/your-path/
and replace it with the path on your machine where you would like the benchmark results files stored.your-galaxy-username
and replace it with your Starburst Galaxy username. (ex. kyle.payne@starburst.io)your-galaxy-password
and replace it with your Starburst Galaxy password. your-fte-cluster-host
and replace it with your FTE cluster host name.your-warpspeed-cluster-host
and replace it with your Warp Speed cluster host name.Now that you've updated the JMX files to make them unique to your environment, it's time to run them in JMeter. These instructions use the JMeter GUI, but you can also use a terminal to run the scripts if you prefer.
The first JMX script will create the schema and tables needed for your tests.
/bin
folder.01-Galaxy-Large-FTE-CREATE-TPCDS-sf1000_01.jmx
.The second JMX script will populate the tables you just created with data from the tpcds.sf1000
schema.
02-Galaxy-Large-FTE-INSERT-INTO-TPCDS-sf1000_01.jmx
.The third JMX script will optimize your tables.
03-Galaxy-Large-FTE-OPTIMIZE-TPCDS-sf1000_01.jmx
.The final JMX script will simulate a true enterprise environment, where multiple users are issuing SQL at the same time. This test also ensures that you understand the benefit Warp Speed provides in an enterprise environment.
04-Galaxy-Large-WarpSpeed-TPCDS-Iceberg-sf1000_01.jmx
.Congratulations! You have reached the end of this tutorial and have successfully used JMeter and Starburst Galaxy to run benchmark testing.
Starburst has many other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.
Visit the Tutorials section to view the full list of tutorials, and keep moving forward on your journey!