Background

Benchmark testing is a valuable tool for evaluating and optimizing the performance of systems and components to meet the desired performance requirements.

SQL benchmark testing

SQL benchmark testing involves measuring the performance of a database system. The purpose of SQL benchmark testing is to evaluate the database's performance in terms of speed, scalability, and reliability. This type of testing can help identify bottlenecks, optimize database configurations, and make informed decisions about hardware and software upgrades.

SQL benchmark testing typically involves running a series of queries or transactions against the database and measuring key performance metrics, such as response time, throughput, and resource utilization. These tests can be performed using standardized benchmarks, such as TPC-DS or TPC-H, or custom benchmarks designed to simulate specific real-world workload patterns.

SQL benchmark testing results

The results of SQL benchmark testing can help database administrators and developers optimize database performance by tuning parameters such as indexes, query optimization, and hardware configurations. Additionally, benchmark testing can be used to compare the cost and performance of different database systems or versions to determine which one best meets the needs of a particular application.

Understanding costs and performance results

It's vital to understand that evaluating performance and cost in isolation is insufficient. While the speed of raw query execution is significant, it must be balanced against the associated expenses.

For instance, consider running tests on two different architectures or platforms that seemingly utilize the same resources. Test A completes in 60 minutes, while Test B finishes in 54 minutes, making it 10% faster than Test A. Initially, one might lean towards favoring the setup (and possibly vendor) used for Test B. However, what if Test A costs $50 and Test B costs $100? This scenario prompts a reassessment.

Conducting a third test involves doubling the size of the architecture used in Test A to align both tests based on cost. Surprisingly, Test A now completes in 30 minutes for $100. Consequently, the conclusion shifts, indicating that the solution employed in Test A is actually the superior choice in terms of value for money.

Further optimization can be explored by scaling Test A up to 150% of its initial size, resulting in a cost of $75 and a runtime of 45 minutes, showcasing improved performance compared to Test B at a reduced cost.

It's important to note that performance doesn't always scale linearly, as demonstrated in the example above. This underscores the critical importance of benchmark testing, as understanding the actual cost and performance of a solution significantly impacts your business's bottom line.

Testing with your own SQL and own data sources

Beginning your benchmark testing with a standard test harness like TPC-DS isn't necessarily a misguided approach. However, it's unlikely that your own environment will mirror a standard benchmark precisely. More often than not, when a solution excelled with a standard test harness like TPC-DS, it underperformed when tested against real-world data sources and SQL, often at a lower cost.

Hence, it's advisable to initiate testing with your own data sources and SQL queries. Alternatively, at the very least, ensure that your testing doesn't conclude after solely relying on a standardized benchmark like TPC-DS.

In this tutorial, the JMX file for the final read test can be easily customized by substituting the TPC-DS SQL with your own SQL queries that utilize tables from your own data sources. This approach eliminates the preparatory work required to deploy the TPC-DS tables. Should you opt for this method, simply bypass steps 1 - 3 outlined in the "Run JMX scripts" section, which involve creating, populating, and optimizing the TPC-DS tables.

Benchmark testing best practices

If you're simply conducting testing to acquaint yourself with the process, you don't necessarily need to adhere strictly to all the best practices outlined below. These practices are specifically recommended for those conducting benchmark testing as part of an evaluation for a new business solution.

Use your own SQL and your own data sources. We called this out above but it is worth repeating. A decision to purchase a solution should never be made on standardized benchmark testing alone. It is very likely you will experience a different performance or cost when you move to production if you don't test with your own data and SQL.
When conducting tests in the cloud, it's crucial to perform multiple tests. If feasible, run these tests in various regions and at different times of the day. The choice of region and time can significantly influence the performance results, highlighting variations that may occur.
When comparing two entities, ensure that the tests are conducted simultaneously using distinct compute and data sources. For instance, if the comparison involves iceberg tables between two systems, set up two separate sets of those tables in distinct object storage buckets or accounts.
When it's not feasible to run two tests simultaneously using different resources, it's crucial to run the tests consecutively. If the tests are conducted at different times of the day or days apart, any performance variance between the compared entities might be attributed to time-of-day bottlenecks within the cloud infrastructure rather than the systems being compared.
Conduct a minimum of five sets of tests. Running multiple tests allows for the identification of outliers that can identify unusually good or poor results.
ALWAYS test with concurrency if your environment has multiple users issuing SQL at the same time. Many systems perform very well when given a single SQL statement but perform very poorly when given many to run at once.
It's essential to understand the interplay between the compute resources, data set size, and concurrency settings in your testing environment. For instance, if you aim to test with a 1 TB uncompressed dataset like the sf1000, and anticipate running a test with a concurrency of 5 users, you may need over 500 vCPUs. Attempting to conduct the same test with insufficient compute or higher concurrency could lead to failures, reaching the limits of the provided compute rather than the platform's capabilities.

This underscores the importance of balancing cost and performance. If one vendor can't complete a test with the same compute resources as another but offers half the cost, it's advisable to scale up the environment of the less expensive vendor to match costs and observe the outcomes. Ultimately, what matters is the cost-effectiveness of completing the SQL workload within the required timeframe.

Scope of tutorial

This tutorial provides all of the required code as a series of JMX files. To use it, you will need to edit the code to make it unique to your environment.

Learning objectives

Once you've completed this tutorial, you will be able to:

Use Starburst Galaxy and JMeter to run SQL benchmark tests.

Prerequisites

You need a Starburst Galaxy account to complete this tutorial.

If you do not have a Starburst Galaxy account already, please see Starburst Galaxy: Getting started for instructions on setting up a free account.
This tutorial comes with a bring-your-own-storage requirement. Before proceeding with this lesson, you must first set up an Amazon S3 bucket.

Background

Your first task is to install JMeter on your machine. You'll also need to download the Trino JDBC driver and move its associated files to the JMeter /lib directory.

Step 1: Download JMeter

You can download the binaries from the Apache JMeter website.

Navigate to Apache JMeter downloads.
Download the latest version of JMeter.
Extract the downloaded zip file.

Step 2: Download Trino JDBC driver

The Trino JDBC driver is required to allow communication between JMeter and your Starburst Galaxy clusters.

Click here to download the driver.
Copy the JDBC driver *.jar file to the JMeter /lib directory.

Background

Now that you have JMeter installed and ready to go, it's time to set up your Starburst Galaxy environment. To prepare for testing, you'll need to create two clusters, one with fault-tolerant execution mode enabled and one with Warp Speed enabled. This section walks through the steps to configure a fault-tolerant execution mode cluster, which will be used to create tables during testing.

Step 1: Sign into Starburst Galaxy

Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.

Input your Email and Password.
Click the Sign in to Starburst Galaxy button.

Step 2: Set your role

Starburst Galaxy separates users by role. Your current role is listed in the top right-hand corner of the screen.

Creating a cluster in Starburst Galaxy will require access to a role with appropriate privileges. Today, you'll be using the accountadmin role.

Check your role, to ensure that it is set to accountadmin.
If it is set to anything else, use the drop-down menu to select the correct role.

Step 3: Create cluster

New clusters can be created from the Clusters section of Starburst Galaxy.

Use the left-hand navigation menu to select Admin>>Clusters.
Click the Create cluster button.

Step 4: Name the cluster

You're going to start by giving the new cluster a name. This should be something descriptive of its use and location.

Enter a meaningful Cluster name.
For this tutorial we will use jmeter-testing, but you are free to choose a different name if you'd like.

Step 5: Add catalogs to new cluster

Now it's time to choose which catalogs to include in the cluster. For now, you only need to add the tpcds catalog. You'll also need to choose the Cloud provider region.

Click the Catalogs drop-down menu.
Select tpcds.
Click outside of the box to return to the previous screen.
Expand the Cloud provider region drop-down menu, and select US East (Ohio).

Step 6: Add cluster details

Next, you'll need to add the remaining cluster details to set up the new cluster. We recommend using the following configuration.

In the Execution mode menu, select Fault Tolerant.
Review the note advising you about the latency penalty of using resource-intensive clusters.
In the Cluster size menu, select Large.
In the Uptime menu, select 1 Hour.
Leave all other fields unchanged.
Click Create cluster.

Background

The second cluster you create will be used to run the benchmark tests. You'll find that the configuration is almost identical to the steps you just completed, with the exception of the Execution mode setting.

Step 1: Create cluster

Click the Create cluster button again.

Step 2: Name the cluster

Again, the cluster name should be something descriptive of its use and location.

Enter a meaningful Cluster name.
For this tutorial we will use jmeter-testing-warpspeed, but you are free to choose a different name if you'd like.

Step 3: Add catalogs to new cluster

Now, it's time to choose which catalogs (data sources) to include in the cluster. For now, you only need to add the tpcds catalog. You also need to choose the Cloud provider region.

Click the Catalogs drop-down menu.
Select tpcds.
Click outside of the box to return to the previous screen.
Expand the Cloud provider region drop-down menu, and select US East (Ohio).

Step 4: Add cluster details

Next, you'll need to add the remaining cluster details to set up the new cluster. We recommend using the following configuration.

We will have you deploy an Accelerated cluster which allows the test to benefit from Smart Index and Caching. Today most vendors have some type of SSD caching enabled by default which cannot be disabled.

In the Execution mode menu, select Accelerated.
In the Cluster size menu, select Large.
Disable Query result caching.
In the Uptime menu, select 1 Hour.
Leave all remaining fields unchanged.
Click Create cluster.

Background

Now it's time to configure a catalog in Starburst Galaxy that connects to your Amazon S3 bucket. As part of the benchmark testing process, the scripts you run will create and populate tables within this catalog.

Step 1: Create new catalog

Create a new catalog for your Amazon S3 data source.

Use the left-hand navigation menu to select Data>>Catalogs.
Click the Create catalog button.

Step 2: Select Amazon S3 data source

Starburst Galaxy allows the creation of catalogs for a number of different data sources. In this case, you are going to create a new catalog in the Amazon S3 category.

Click the Amazon S3 tile.

Step 3: Input name and description

We recommend using the name benchmark for your catalog. This catalog name is hard coded into the provided JMX files, and you will have to edit them if you use a different name.

In the Catalog name field, enter benchmark.
In the Description field, input a description. This can be anything you want, so make it meaningful for you.
Scroll down to continue the configuration process.

Step 4: Choose authentication method

Starburst Galaxy allows you to configure several different authentication methods when creating a new catalog. For this tutorial, we recommend using a cross account IAM role which is considered a best practice by AWS.

Under the Authentication with heading, select Cross account IAM role.
Use the drop-down menu to choose your cross account IAM role.

Step 5: Select the metastore

Starburst Galaxy provides three metastore options for Amazon S3 catalogs (Starburst Galaxy, Amazon Glue, and Hive). For this tutorial, we will use the Starburst Galaxy metastore, as it removes the burden of configuring and managing a separate metastore service. However, if you prefer to use Glue, you may do so.

Use the Metastore type drop-down menu to select Starburst Galaxy.
In the Default S3 bucket name field, type the name of your Amazon S3 bucket.
In the Default directory name field, provide the S3 bucket directory where you would like the metadata associated with this storage account to be stored.

Note: The scripts in this tutorial will create external, or unmanaged, tables, so the default directory will not be used.

Select Allow creating external tables.
Select Allow writing to external tables.

Step 6: Select the default table format

Starburst recommends setting Apache Iceberg as the default table format to ensure that all tables will use the Iceberg format when you create them without specifying the format.

Select Iceberg.

Step 7: Test and connect

You're almost there! Time to test the connection and then complete the process of creating your new Amazon S3 catalog.

Click the Test connection button.
Confirm that you see the Hooray! You can now add this catalog to a cluster message.
Click the Connect catalog button.

Step 8: Save access controls

You can leave the default access controls.

Click the Save access controls button.

Step 9: Add catalog to clusters

You need to add your catalog to both clusters you created earlier in this tutorial.

Click to expand the drop-down menu, and select your FTE and Warp Speed clusters.
Click the Add to cluster button.

Background

There are four JMX files for you to download, provided in a zipped folder here. The first three files will create, populate, and optimize tables in Starburst Galaxy, while the fourth will run the benchmark tests.

The tables created by the JMX scripts are pulling data from the tpcds catalog. We chose to create our own tables rather than querying the tpcds catalog directly because it is bad practice to query a data generator. The results will not be valid if you do.

As you look through the files, you may notice that there are several CAST operations. This is because our data generator for tpcds was built for the Hive table format, and we are using the Iceberg table format for this test. The CAST operations will ensure that Iceberg-supported data types are used.

Step 1: Download JMX files

Download the zipped folder from the link provided above.
Open the folder to view the files.

Step 2: Update JMX files

Now, it's time to edit the provided files to make them unique to your environment.

Open all four files with your preferred tool. We used VSCode.
Use find and replace in each file to make the edits outlined below.
Search for /your-path/ and replace it with the path on your machine where you would like the benchmark results files stored.
Search for your-galaxy-username and replace it with your Starburst Galaxy username. (ex. kyle.payne@starburst.io)
Search for your-galaxy-password and replace it with your Starburst Galaxy password.
In the first three files, search for your-fte-cluster-host and replace it with your FTE cluster host name.
In the fourth file, search for your-warpspeed-cluster-host and replace it with your Warp Speed cluster host name.

Background

Now that you've updated the JMX files to make them unique to your environment, it's time to run them in JMeter. These instructions use the JMeter GUI, but you can also use a terminal to run the scripts if you prefer.

Step 1: Create tables in Starburst Galaxy

The first JMX script will create the schema and tables needed for your tests.

Begin by ensuring that your FTE cluster in Starburst Galaxy is running. If it isn't, start it and wait until it is running to continue.
Open JMeter by clicking on the executable file in the /bin folder.
In the JMeter GUI, open the following file: 01-Galaxy-Large-FTE-CREATE-TPCDS-sf1000_01.jmx.
Run the script by clicking the green triangle Start button.
Once the script has completed, locate the results file to ensure that all tables were created successfully. You can also check in your Starburst Galaxy account to view the new tables.

Step 2: Populate tables

The second JMX script will populate the tables you just created with data from the tpcds.sf1000 schema.

In the JMeter GUI, open the following file: 02-Galaxy-Large-FTE-INSERT-INTO-TPCDS-sf1000_01.jmx.
Run the script by clicking the green triangle Start button.
Once the script has completed, locate the results file to ensure that all tables were successfully populated.

Step 3: Optimize tables

The third JMX script will optimize your tables.

In the JMeter GUI, open the following file: 03-Galaxy-Large-FTE-OPTIMIZE-TPCDS-sf1000_01.jmx.
Run the script by clicking the green triangle Start button.
Once the script has completed, locate the results file to ensure that all optimizations were completed successfully.

Step 4: Run benchmark tests

The final JMX script will simulate a true enterprise environment, where multiple users are issuing SQL at the same time. This test also ensures that you understand the benefit Warp Speed provides in an enterprise environment.

In the JMeter GUI, open the following file: 04-Galaxy-Large-WarpSpeed-TPCDS-Iceberg-sf1000_01.jmx.
Run the script by clicking the green triangle Start button.
Once the script has completed, review the results file, which will provide information such as the speed of each round of SQL. You can compare subsequent rounds to the earlier rounds to see the benefit of Warp Speed.
Now that you know how long this can take, you can identify the cost. For example, if it took an hour to run, multiply the number of workers (16) times the cost for your region.
We recommend that you run the same test against other vendors you are evaluating to compare their costs.

Tutorial complete

Congratulations! You have reached the end of this tutorial and have successfully used JMeter and Starburst Galaxy to run benchmark testing.

Next steps

Starburst has many other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Background

SQL benchmark testing

SQL benchmark testing results

Understanding costs and performance results

Testing with your own SQL and own data sources

Benchmark testing best practices

Scope of tutorial

Learning objectives

Prerequisites

Background

Step 1: Download JMeter

Step 2: Download Trino JDBC driver

Background

Step 1: Sign into Starburst Galaxy

Step 2: Set your role

Step 3: Create cluster

Step 4: Name the cluster

Step 5: Add catalogs to new cluster

Step 6: Add cluster details

Background

Step 1: Create cluster

Step 2: Name the cluster

Step 3: Add catalogs to new cluster

Step 4: Add cluster details

Background

Step 1: Create new catalog

Step 2: Select Amazon S3 data source

Step 3: Input name and description

Step 4: Choose authentication method

Step 5: Select the metastore

Step 6: Select the default table format

Step 7: Test and connect

Step 8: Save access controls

Step 9: Add catalog to clusters

Background

Step 1: Download JMX files

Step 2: Update JMX files

Background

Step 1: Create tables in Starburst Galaxy

Step 2: Populate tables

Step 3: Optimize tables

Step 4: Run benchmark tests

Tutorial complete

Next steps

Other Tutorials