Background

Benchmark testing is a valuable tool for evaluating and optimizing the performance of systems and components to meet the desired performance requirements.

SQL benchmark testing

SQL benchmark testing involves measuring the performance of a database system. The purpose of SQL benchmark testing is to evaluate the database's performance in terms of speed, scalability, and reliability. This type of testing can help identify bottlenecks, optimize database configurations, and make informed decisions about hardware and software upgrades.

SQL benchmark testing typically involves running a series of queries or transactions against the database and measuring key performance metrics, such as response time, throughput, and resource utilization. These tests can be performed using standardized benchmarks, such as TPC-DS or TPC-H, or custom benchmarks designed to simulate specific real-world workload patterns.

SQL benchmark testing results

The results of SQL benchmark testing can help database administrators and developers optimize database performance by tuning parameters such as indexes, query optimization, and hardware configurations. Additionally, benchmark testing can be used to compare the cost and performance of different database systems or versions to determine which one best meets the needs of a particular application.

Understanding costs and performance results

It's vital to understand that evaluating performance and cost in isolation is insufficient. While the speed of raw query execution is significant, it must be balanced against the associated expenses.

For instance, consider running tests on two different architectures or platforms that seemingly utilize the same resources. Test A completes in 60 minutes, while Test B finishes in 54 minutes, making it 10% faster than Test A. Initially, one might lean towards favoring the setup (and possibly vendor) used for Test B. However, what if Test A costs $50 and Test B costs $100? This scenario prompts a reassessment.

Conducting a third test involves doubling the size of the architecture used in Test A to align both tests based on cost. Surprisingly, Test A now completes in 30 minutes for $100. Consequently, the conclusion shifts, indicating that the solution employed in Test A is actually the superior choice in terms of value for money.

Further optimization can be explored by scaling Test A up to 150% of its initial size, resulting in a cost of $75 and a runtime of 45 minutes, showcasing improved performance compared to Test B at a reduced cost.

It's important to note that performance doesn't always scale linearly, as demonstrated in the example above. This underscores the critical importance of benchmark testing, as understanding the actual cost and performance of a solution significantly impacts your business's bottom line.

Testing with your own SQL and own data sources

Beginning your benchmark testing with a standard test harness like TPC-DS isn't necessarily a misguided approach. However, it's unlikely that your own environment will mirror a standard benchmark precisely. More often than not, when a solution excelled with a standard test harness like TPC-DS, it underperformed when tested against real-world data sources and SQL, often at a lower cost.

Hence, it's advisable to initiate testing with your own data sources and SQL queries. Alternatively, at the very least, ensure that your testing doesn't conclude after solely relying on a standardized benchmark like TPC-DS.

In this tutorial, the JMX file for the final read test can be easily customized by substituting the TPC-DS SQL with your own SQL queries that utilize tables from your own data sources. This approach eliminates the preparatory work required to deploy the TPC-DS tables. Should you opt for this method, simply bypass steps 1 - 3 outlined in the "Run JMX scripts" section, which involve creating, populating, and optimizing the TPC-DS tables.

Benchmark testing best practices

If you're simply conducting testing to acquaint yourself with the process, you don't necessarily need to adhere strictly to all the best practices outlined below. These practices are specifically recommended for those conducting benchmark testing as part of an evaluation for a new business solution.

This underscores the importance of balancing cost and performance. If one vendor can't complete a test with the same compute resources as another but offers half the cost, it's advisable to scale up the environment of the less expensive vendor to match costs and observe the outcomes. Ultimately, what matters is the cost-effectiveness of completing the SQL workload within the required timeframe.

Scope of tutorial

This tutorial provides all of the required code as a series of JMX files. To use it, you will need to edit the code to make it unique to your environment.

Learning objectives

Once you've completed this tutorial, you will be able to:

Prerequisites

Background

Your first task is to install JMeter on your machine. You'll also need to download the Trino JDBC driver and move its associated files to the JMeter /lib directory.

Step 1: Download JMeter

You can download the binaries from the Apache JMeter website.

Step 2: Download Trino JDBC driver

The Trino JDBC driver is required to allow communication between JMeter and your Starburst Galaxy clusters.

Background

Now that you have JMeter installed and ready to go, it's time to set up your Starburst Galaxy environment. To prepare for testing, you'll need to create two clusters, one with fault-tolerant execution mode enabled and one with Warp Speed enabled. This section walks through the steps to configure a fault-tolerant execution mode cluster, which will be used to create tables during testing.

Step 1: Sign into Starburst Galaxy

Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.

Step 2: Set your role

Starburst Galaxy separates users by role. Your current role is listed in the top right-hand corner of the screen.

Creating a cluster in Starburst Galaxy will require access to a role with appropriate privileges. Today, you'll be using the accountadmin role.

Step 3: Create cluster

New clusters can be created from the Clusters section of Starburst Galaxy.

Step 4: Name the cluster

You're going to start by giving the new cluster a name. This should be something descriptive of its use and location.

Step 5: Add catalogs to new cluster

Now it's time to choose which catalogs to include in the cluster. For now, you only need to add the tpcds catalog. You'll also need to choose the Cloud provider region.

Step 6: Add cluster details

Next, you'll need to add the remaining cluster details to set up the new cluster. We recommend using the following configuration.

Background

The second cluster you create will be used to run the benchmark tests. You'll find that the configuration is almost identical to the steps you just completed, with the exception of the Execution mode setting.

Step 1: Create cluster

Step 2: Name the cluster

Again, the cluster name should be something descriptive of its use and location.

Step 3: Add catalogs to new cluster

Now, it's time to choose which catalogs (data sources) to include in the cluster. For now, you only need to add the tpcds catalog. You also need to choose the Cloud provider region.

Step 4: Add cluster details

Next, you'll need to add the remaining cluster details to set up the new cluster. We recommend using the following configuration.

We will have you deploy an Accelerated cluster which allows the test to benefit from Smart Index and Caching. Today most vendors have some type of SSD caching enabled by default which cannot be disabled.

Background

Now it's time to configure a catalog in Starburst Galaxy that connects to your Amazon S3 bucket. As part of the benchmark testing process, the scripts you run will create and populate tables within this catalog.

Step 1: Create new catalog

Create a new catalog for your Amazon S3 data source.

Step 2: Select Amazon S3 data source

Starburst Galaxy allows the creation of catalogs for a number of different data sources. In this case, you are going to create a new catalog in the Amazon S3 category.

Step 3: Input name and description

We recommend using the name benchmark for your catalog. This catalog name is hard coded into the provided JMX files, and you will have to edit them if you use a different name.

Step 4: Choose authentication method

Starburst Galaxy allows you to configure several different authentication methods when creating a new catalog. For this tutorial, we recommend using a cross account IAM role which is considered a best practice by AWS.

Step 5: Select the metastore

Starburst Galaxy provides three metastore options for Amazon S3 catalogs (Starburst Galaxy, Amazon Glue, and Hive). For this tutorial, we will use the Starburst Galaxy metastore, as it removes the burden of configuring and managing a separate metastore service. However, if you prefer to use Glue, you may do so.

Note: The scripts in this tutorial will create external, or unmanaged, tables, so the default directory will not be used.

Step 6: Select the default table format

Starburst recommends setting Apache Iceberg as the default table format to ensure that all tables will use the Iceberg format when you create them without specifying the format.

Step 7: Test and connect

You're almost there! Time to test the connection and then complete the process of creating your new Amazon S3 catalog.

Step 8: Save access controls

You can leave the default access controls.

Step 9: Add catalog to clusters

You need to add your catalog to both clusters you created earlier in this tutorial.

Background

There are four JMX files for you to download, provided in a zipped folder here. The first three files will create, populate, and optimize tables in Starburst Galaxy, while the fourth will run the benchmark tests.

The tables created by the JMX scripts are pulling data from the tpcds catalog. We chose to create our own tables rather than querying the tpcds catalog directly because it is bad practice to query a data generator. The results will not be valid if you do.

As you look through the files, you may notice that there are several CAST operations. This is because our data generator for tpcds was built for the Hive table format, and we are using the Iceberg table format for this test. The CAST operations will ensure that Iceberg-supported data types are used.

Step 1: Download JMX files

Step 2: Update JMX files

Now, it's time to edit the provided files to make them unique to your environment.

Background

Now that you've updated the JMX files to make them unique to your environment, it's time to run them in JMeter. These instructions use the JMeter GUI, but you can also use a terminal to run the scripts if you prefer.

Step 1: Create tables in Starburst Galaxy

The first JMX script will create the schema and tables needed for your tests.

Step 2: Populate tables

The second JMX script will populate the tables you just created with data from the tpcds.sf1000 schema.

Step 3: Optimize tables

The third JMX script will optimize your tables.

Step 4: Run benchmark tests

The final JMX script will simulate a true enterprise environment, where multiple users are issuing SQL at the same time. This test also ensures that you understand the benefit Warp Speed provides in an enterprise environment.

Tutorial complete

Congratulations! You have reached the end of this tutorial and have successfully used JMeter and Starburst Galaxy to run benchmark testing.

Next steps

Starburst has many other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Other Tutorials

Visit the Tutorials section to view the full list of tutorials, and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.