Share

Customers who want a single, super fast and easy-to-use solution for both interactive and longer-running data pipeline queries now have a solution: take advantage of fault tolerant execution (FTE) mode, now in Public Preview for Starburst Enterprise. FTE mode runs your data pipeline queries more intelligently by letting you reliably run much larger data pipeline queries, save costs by running non-latency-sensitive queries on much smaller clusters, and execute more queries concurrently. With FTE, less is actually more!

We soft-launched this feature first in the Trino engine in May 2022, and since then we have seen adoption by customers across industries for a wide variety of use cases. These use cases include:

  • Building large rollup tables
  • Preparing datasets for machine learning models
  • Wrangling data that feed into data applications.

The following principles stood out to us as critical components of the user experience:

Extremely Fast

Deliver an order of magnitude gain in query performance, enabling you to do analytics at TB++ scale at the speed of thought. Trino (fka PrestoSQL) is able to achieve incredibly fast speeds by prioritizing in-memory execution, along with lots of other performance enhancements like highly optimized ORC/Parquet readers, columnar reads, predicate pushdown, and lazy reads to name a few. If Trino isn’t already fast enough, Starburst Enterprise makes things even faster with the Parquet accelerator.

In the below test, we created files containing 10 TB schema using the TPC-H lineitem generator, and compared the performance between open source Spark on EC2 hardware and Trino in Starburst Enterprise.

 

As demonstrated below, fault tolerant execution mode enables long-running and memory-intensive queries to complete. As most ETL deployments have concurrent queries running, we designed our system to be most optimal under highly concurrent settings (see results with concurrency factor of 4).

Total TPC-H Runtime, EMR-Spark vs. SEP Batch

 

Figure 1. TPC-H Scale Factor 10,000 (10TB data schema) benchmark runtime

 

The data above indicates that the TPC-H ETL workload typically completes more quickly using FTE and Starburst Enterprise together vs. Spark. Higher job concurrency widens the performance benefit as well. The chart below shows an increasing advantage using FTE as more concurrent ETL jobs are executed:

STL Jobs - Percent Performance Increase, SEP+FTE vs. EMR Spark

 

Figure 2. Percent Performance Increase, Starburst Enterprise+FTE vs. Spark 

 

Reliable Under Load

The iterative approach to task execution lets you run long-running and memory-intensive queries successfully. Many of these queries would fail otherwise.

In the below test, we created files containing 10 TB schema using the TPC-H lineitem generator (with scale factor of 10,000), and compared the performance between fault tolerant execution (FTE) mode and original execution mode using fifteen m5.8xlarge workers. As demonstrated below, fault tolerant execution mode is able to execute long-running and memory-intensive queries that the original execution mode errored out on.

Fault Tolerant vs. Streaming Execution on ETL Workloads

Figure 3. FTE reliability vs. Streaming mode

Easy to use

Data engineers’ time is valuable, so we focus on letting data engineers write business logic at the speed of thought:

  • Data engineers get to take advantage of standard SQL dialect between interactive and ETL analytics: engineers no longer need to learn different SQL dialects depending on the size of the job.
  • Data engineers can iteratively test SQL queries as they develop complex data pipelines because the coordinator always up and waiting for a query as opposed to the LAZY coordinator that takes longer to boot up that many other engines use.
  • Starburst focuses on delivering debuggable error messages so that an engineer can quickly pinpoint issues.
  • Engineers can deliver query federation over 50+ connectors so that it is no longer necessary to move the data.
  • With the Starburst Enterprise ecosystem, we offer Data Products for data engineers to discover data assets and write pipelines quickly.
  • Starburst Enterprise also provides query plan analysis so that data engineers can quickly debug issues, and access controls, audit logs, and query history to ensure regulatory compliance.
  • We offer Starburst Enterprise integrations with popular ELT tools like Airflow and DBT so that you can use the tools you love.

 

Balance cost savings latency and auto-scaling

Smaller clusters: Thanks to the FTE’s iterative execution mode, you can now balance compute costs and query runtime to save on hardware costs by running queries on a much smaller cluster than before. Most organizations have a collection of queries that are not time-sensitive, and running them on “large” clusters is therefore not cost-effective. FTE allows these queries to run longer and successfully on a smaller hardware footprint.

To test this, we ran the query below on 1TB data schema, and were able to complete the query using hardware with 4x less memory.

Sample Query on 2TB data schema using hardware with 4x less memory

 

Figure 4. Sample Query

Original execution mode:

Nodes Results
2 m5.8xlarge Query exceeded per-node memory limit of 61.60GB
4 m5.8xlarge Query exceeded per-node memory limit of 61.60GB
8 m5.8xlarge Completed 2m 10s

 

Fault tolerant execution mode:

Nodes Results
2 m5.8xlarge Completed 13m 50s

 

Get Started

We are very excited to be sharing the Public Preview of the Fault Tolerant Execution mode in Starburst Enterprise. We believe this feature will benefit our customers greatly, allowing you to harness the dynamic power of our solution. We hope you will engage with our new feature and realize the many benefits of using FTE!

We strongly recommend setting up a separate fault tolerant cluster, to avoid significant latency penalties for short-running queries. Contact your SA today to get a fault tolerant cluster set up and tuned!

To learn more about Fault Tolerant Execution, you can start with the documentation!