Reduce Snowflake compute costs with Starburst Galaxy

  • Evan Smith

    Evan Smith

    Technical Content Manager

    Starburst Data

  • Ahmed Niyaz

    Ahmed Niyaz

    Product Manager

    Starburst

Share

The promise of Snowflake has always been the same–all the power of a data warehouse with the infinite scalability of the cloud. We could call this the “data warehouse in the cloud” model. 

In recent years, Snowflake has extended their offering to include data lakes and even data lakehouses in the form of Iceberg managed tables. This shift sees Snowflake join the growing chorus of other companies, including Starburst, adopting Iceberg as a disruptive, epochal technology. 

Snowflake is expensive

Snowflake offers the powerful combination of a data warehouse with the infinite scalability of the cloud. Recently, they’ve expanded their offerings to include data lakes and data lakehouses using Iceberg managed tables. However, despite these advancements, Snowflake’s significant cost remains a major concern.

Starburst lowers compute costs and enhances openness for Snowflake users

All this talk of rising costs poses one fundamental question. When it comes to lower compute costs for Snowflake workloads, is there another way to retain ease of use without the price tag? You’ve come to the right place. Disrupting the data industry in favor of openness is what Starburst is all about, and if you’re a Snowflake user suffering from high compute costs, we’ve got you covered. 

Enter the Starburst Galaxy Snowflake Catalog Metastore, designed to help Snowflake users reduce compute costs while adding features like data federation. Our new connector allows Snowflake users to leverage their data without incurring high compute costs, bringing openness and flexibility to their data architecture.

Unpacking the Snowflake metastore

Currently, Snowflake allows Iceberg tables to be created in a customer managed AWS S3 bucket. Importantly, the Snowflake catalog metastore only applies to customer managed storage, meaning that users of this approach currently pay for compute costs themselves. 

Instead, the shift comes by offloading compute from Snowflake to Starburst. 

This feature allows all Iceberg metadata added to a Snowflake data lakehouse to be registered by their proprietary metastore. This approach will be familiar to Databricks users accustomed to using that ecosystem’s Unity catalog. The Snowflake metastore acts like a Snowflake unity catalog for Snowflake data held in the snowflake ecosystem. 

The example below helps illustrate how this works for most Snowflake users. 

How it works

Currently, Snowflake allows Iceberg tables to be created in customer-managed AWS S3 buckets. The Snowflake metastore only applies to customer-managed storage, meaning users pay for compute costs. By offloading compute from Snowflake to Starburst, users can access their data at a significantly lower cost while maintaining the benefits of Snowflake’s ecosystem.

Example

Imagine a company using Snowflake with 70% traditional data warehouse tables and 30% Iceberg managed tables. They are locked into Snowflake’s ecosystem and facing high costs. With Starburst Galaxy, they can query the same Iceberg tables at a lower cost, retaining their Snowflake ecosystem while saving money. This allows them to perform the same queries with reduced compute costs and explore additional data sources seamlessly.

How Starburst saves Snowflake users money

Data lakes and data lakehouses use cost-effective cloud object storage (like AWS S3), handling structured, semi-structured, and unstructured data without expensive ETL processes, unlike data warehouses. Additionally, data lakehouses, including those using Apache Iceberg, also make use of cloud object storage.

However, unlike data lakes based around Apache Hive, data lakehouses collect additional metadata that allows them to update, insert, and delete data in a way that’s more like traditional data warehouse workloads. This is how data lakehouses have claimed their spot at the forefront of the data world.

Snowflake is no exception and their Iceberg managed tables allows Snowflake users to realize some cost savings over data warehouses. But staying inside the Snowflake ecosystem doesn’t unlock all of the savings. It’s still a closed environment and that comes with costs.

Snowflake metastore, Starburst compute

The solution is openness, swapping out the Snowflake query engine for Starburst Galaxy. Simply put, Snowflake is not the cheapest way to process an Apache Iceberg workload. This is certainly true for data not held in Snowflake, but it’s also true for data held inside the Snowflake ecosystem. 

For users tied to the Snowflake ecosystem, they can continue using Snowflake Iceberg tables, but offload the compute to Starburst. This means that Snowflake users can access the same data lake on Snowflake using Starburst Galaxy but at considerably lower compute costs compared to Snowflake. 

Open data architecture 

This new approach allows users to process data with Starburst Galaxy, avoiding Snowflake’s high compute costs. Users can still perform the same queries but at a reduced cost, benefiting from an open data architecture. This openness can directly improve a company’s bottom line by making a like-for-like swap in compute when accessing the same data source.

Add additional data sources using data federation 

Starburst Galaxy’s connector not only reduces compute costs but also enables data federation, connecting to various data sources through a rich ecosystem of connectors. This means Snowflake users can easily connect additional data sources, offloading certain workloads to less expensive alternatives. This flexibility allows users to choose the right storage and processing options for their specific needs.

Starburst Galaxy Icehouse architecture powered by Trino

Starburst Galaxy is powered by Trino, an open source query engine originally developed by Facebook to deliver flexible performance with clusters designed for data at any scale. Some of the largest, most complex data on the planet is currently processed by Trino and many users accustomed to other systems report being amazed by the speed and 

Accessing your Snowflake data also opens up a world of possibilities to explore Starburst’s own Apache Iceberg tables. In fact, Iceberg was originally developed by Netflix with Trino in mind. The two technologies are naturally intertwined. We call this combination of technologies, the data Icehouse architecture and we think it’s going to remake big data as we know it

Just like everything else with Starburst, you can approach an Icehouse architecture at your own speed. You’re free to experiment, explore, and optimize your workloads in whatever way makes sense for you, including employing an Icehouse implementation. This total freedom is what really makes Starburst special, and we’re so happy that we’re able to offer compute and federation options to Snowflake data lakehouse users.

Join us

Powered by Trino, Starburst Galaxy offers unparalleled performance and flexibility. It allows Snowflake users to explore Starburst’s own Apache Iceberg tables, leading to the innovative data Icehouse architecture. This combination of technologies, originally developed by Netflix with Trino in mind, is set to transform the big data landscape.

Excited for the open data revolution? Start free today and sign up for our managed Icehouse private preview program.