GigaOm TCO report: Starburst data lakehouse enables 3x faster time to insight at half the cost
Share
More deployment options
In a new report Cloud Data Warehouse vs. Cloud Data Lakehouse: A Snowflake vs. Starburst TCO and Performance Comparison, published by GigaOm, a leading technology research firm, a comparison was made between the cost, time, and effort required to adopt a Snowflake cloud data warehouse vs. a data lakehouse powered by Starburst. The report concluded that a Starburst lakehouse architecture could achieve superior price-performance and significantly faster time-to-insight at a much lower total cost of ownership (TCO).
In this blog, we will take a deeper look at these results and explain how Starburst’s approach can help data teams achieve more at a lower total cost. First, the specific results:
- Starburst and Snowflake achieve nearly identical cost-performance on standard BI workloads
- Starburst achieved 5x better price-performance vs. Snowflake on complex and semi-structured workloads
- A Starburst lakehouse architecture is up to 67% faster to set up compared to migrating data into Snowflake
- Over a 3-year period, Starburst’s TCO is up to 55% less compared to operating Snowflake on the same dataset.
The setup: The research report aims to replicate a typical data migration project, involving existing cloud and on-prem data sources
The test was designed to simulate an actual migration project that an enterprise might take. The migration process was broken down into four distinct steps: Planning, Migration, Path-to-Production, and Post-Migration. These processes were further broken down into individual tasks, then measured and documented for each scenario in the comparison report.
The hypothetical enterprise performing the migration has a few different data sources. To represent a legacy on-premise OLAP system, a traditional Oracle data warehouse was used. Data is piped into Oracle from upstream transactional systems that represents the traditional channels like brick and mortar sales. In addition to legacy OLAP, a couple of cloud data sources were also included. A cloud Postgres database represents OLTP data from ecommerce sales, and JSON data stored in S3 represents weblog data that can be used to analyze customer lifecycles.
For the Snowflake test scenario, per Snowflake requirements, all data must first be moved to the cloud and into Snowflake before any queries can be run. After the initial lift-and-shift migration, ongoing ETL pipelines must be maintained to continually ingest data into Snowflake for analytical use.
For the Starburst test scenario, a few different options were considered. Because Starburst can access data at the source, the report considered various combinations of lift-and-shift migration and ongoing federation. The lowest TCO option was a migration of cloud sources to an Iceberg data lake with ongoing federation to the on-prem data source.
Result 1: Starburst and Snowflake achieve nearly identical cost-performance on standard BI workloads
Despite significant interest in data lakes and lakehouses within data engineering communities in recent years, as well as the rise of open source data lake formats, such as Apache Iceberg, Delta Lake, and Apache Hudi, many data engineers and architects still express hesitancy around the performance and manageability of data lakes today. Some have the battle scars from their experiences during the Hadoop era to justify it. Legacy data lakes were seen as slow, cumbersome, and difficult to manage.
The GigaOm report shows that a modern data lake powered by Starburst is just as performant and cost-effective as a cloud data warehouse for standard BI queries, as measured by the industry standard TPC-DS benchmark.
To achieve similar query performance on the benchmark across both systems, the following setup was used:
Snowflake | Starburst | |
Cloud | AWS | AWS |
Storage | Native Tables | S3 Iceberg |
Performance Features | Clustering | Cache Service Warp Speed |
Pricing Tier | Enterprise | EC2 3-year Reserved Instances |
Cluster Size/Instance Type | Large | 10 nodes of m6gd.8xlarge (320 vCPU) |
Using list prices across both Starburst software as well as AWS infrastructure, the two tests returned nearly identical query response times at similar cost:
TPC-DS | Criteria | Snowflake | Starburst |
Single-User Stream Execution Time | < 900 sec. | 707 sec. | 751 sec. |
Geometric Mean | < 5 sec. | 3.81 sec. | 4.23 sec. |
Single-User Price-Per-Performance
($/hour / Execution Time / 3,600sec/hr) |
– | $4.71 | $4.88 |
20-User Execution Time | – | 9,366 sec. | 9,686 sec. |
20-User Price-Per-Performance
($/hour/Execution Time/3,600sec/hr) |
– | $62.44 | $62.94 |
Based on this test, the report concludes that price-performance on traditional warehouse workloads for BI (as represented by TPC-DS) are essentially identical between Starburst and Snowflake.
Result 2: Starburst achieved 5x better price-performance vs. Snowflake on complex and semi-structured workloads
Beyond BI workloads, the GigaOm report goes on to test more specialized queries involving semi-structured data. These tests were designed to be representative of typical complex workloads, such as customer analytics, log analytics, clickstream analytics, and security analytics, that often need to incorporate data outside of a data warehouse.
For testing purposes, GigaOm simulated website log data in JSON format. For the comparison, the web data was loaded into Snowflake as a VARIANT column and also into an AWS S3 for Starburst to query. The web data could then be analyzed alone for user traffic patterns like views, and could be further joined with the main datasets to represent performing a customer 360 or conversion analysis. Several of the benchmark queries were tailored to join and analyze data from the JSON web data along with either or both of the other sources.
Here, the ability to analyze unstructured data without having to build and maintain pipelines also adds significant value to the Starburst option.
TPC-DS 1 TB + JSON 1 TB | Criteria | Snowflake | Starburst |
Single-User Stream Execution Time | < 900 sec. | 3,686 sec. | 814 sec. |
Geometric Mean | < 5 sec. | 4.97 sec. | 4.63 sec. |
Single-User Price-Per-Performance
($/hour / Execution Time / 3,600sec/hr) |
– | $24.57 | $5.29 |
In the JSON web data test, Starburst outperformed Snowflake in query speed by 4x and price-performance by nearly 5x.
Result 3: A Starburst lakehouse architecture is up to 67% faster to set up than migrating data into Snowflake
With any data migration, project effort and business disruption are often two major concerns.
The sheer effort required to move vast amounts of data can be overwhelming. This includes ensuring data integrity, managing dependencies between applications, and addressing potential compatibility issues. Additionally, businesses must ensure that their teams are adequately trained to handle new cloud technologies, which can be a steep learning curve for some.
Downtime during the migration (which can often take multiple years) can disrupt business operations, making it crucial to plan major migrations in phases. Moreover, unforeseen challenges, such as data discrepancies or integration hiccups, can extend the migration timeline, causing additional negative business impact.
The GigaOm field test revealed that a Starburst lakehouse architecture requires between 47% and 67% less migration effort compared to migrating data into Snowflake.
In particular, the report found that Starburst’s data federation capability made a significant impact by accelerating some integration tasks and eliminating others altogether.
For a typical data migration project, Starburst reduces overall migration time by up to 67%, which directly translates to a reduction in time-to-insight, allowing businesses to make faster, data-driven decisions.
Result4: Over a 3-year period, Starburst’s total cost of ownership (TCO) is up to 55% less compared to operating Snowflake on the same dataset
Assuming performance and architectural requirements are met by either Starburst or Snowflake, the total cost of ownership can be a deciding factor when choosing a long term technology investment. The GigaOm report summarizes its findings into a TCO comparison, which includes the following important components:
1. Cost of infrastructure (including software, compute, and storage)
- For the equivalent system query performance, infrastructure costs were nearly identical between Snowflake and Starburst implementations, costing approximately $4,000 per week
2. Cost of data migration (including data transfer and labor costs)
- Data migration for Snowflake is complex, time-consuming, and requires expert engineers, which leads to significantly higher costs.
- Snowflake migrations cost nearly 4x the amount of a Starburst lakehouse migration ($860k vs $220k, respectively)
3. Cost of ongoing maintenance post-migration (including ETL pipelines, reliability, quality, and testing)
- Snowflake requires significantly greater ongoing maintenance post-migration, primarily in system administration and quality assurance.
- GigaOm estimates the cost of Snowflake post-migration to be approximately 3x that of Starburst ($1.9M vs $650k, respectively)
4. Opportunity cost during migration (including planning, designing, implementing, and maintaining the system)
- Importantly, one of the most significant costs of any potentially disruptive change is the opportunity cost incurred during the project’s lifetime. This typically manifests itself in a period of system unavailability or instability, during which analytics jobs cannot be completed, are inaccurate, or slow.
- For the Snowflake migration, the lost analytic opportunity was 20.8 weeks. For Starburst, the lost analytic opportunity was 25% of that of Snowflake, or 5.2 weeks.
In a 3-year TCO comparison, Starburst costs 55% less than Snowflake
In general, the GigaOm report further solidifies Starburst’s position as leading Data Lake Analytics Platform, delivering similar or better query performance at a fraction of the cost of traditional data centralization architectures. This is because traditional data warehousing models require adherence to a rigid data architecture, which includes significant data movement and duplication, leading to data lock-in and unpredictable high costs.
Conclusion
Starburst’s Modern Data Lake approach delivers warehouse-like query performance and capability directly on the data lake.
- Similar or better query cost-performance on a truly open platform
- Our innovation isn’t stopping on improving price-to-performance and TCO; we are innovating on behalf of data teams everywhere to make it easier and faster to access, manage, secure, analyze, and share data.
Read the full report
GigaOm’s exclusive Snowflake vs. Starburst performance and TCO analysis
A complete comparison of Starburst and Snowflake
Starburst Galaxy is the fastest and easiest way to get started with open source Trino