×
×

Scale beyond Snowflake with Starburst

Discover how Starburst empowers teams outgrowing Snowflake to quickly support new workloads in a fast, cost-efficient manner that works alongside their existing processes.

Snowflake challenges encountered at scale

Initially, Snowflake meets modern user expectations with features designed for high performance, ease of use, and guaranteed reliability. But as usage scales, costs skyrocket and team productivity suffers.

Data teams slowed down

Majority of time spent on administrative tasks – tuning clusters and optimizing queries to manage costs

Backlog increases

List of new projects and business asks grow as overall productivity of the data team slows down

Commits burned through

Actual costs outpace estimates, placing team budgets under scrutiny

Leverage the Starburst data lakehouse alongside Snowflake to increase productivity and decrease costs

Starburst empowers data teams to quickly support new workloads in a cost-efficient manner by letting teams performantly query data in object storage. With Starburst running alongside your cloud data warehouse, you can choose which workloads to push into your data warehouse, adding openness and flexibility to your data architecture while reducing overall costs by 50-75%.

Starburst is a High Performer in Data Warehousing

Don’t take our word for it. Starburst is named #1 for Quality of Support and Product Direction in G2’s Enterprise Grid Report based on real customer reviews. Additionally, customers said Starburst beat out Snowflake in all of these categories: 

  • Ease of Use
  • Product Direction
  • Data Integration
  • Quality of Support
  • Data Visualization
  • Hadoop Integration
  • WYSIWYG Report Design

Simplicity

Going beyond platform governance and management capabilities, Starburst empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows teams to use a range of existing investments in just a few clicks. It helps to break down data silos and build federated data products from distributed data sets to support use cases and scale self-service usage and adoption across the organization.

Starburst Galaxy

Snowflake

Native data security

Native data security

Automated cluster management

Automated cluster management

    Built-in real-time usage monitoring

    Built-in real-time usage monitoring

    Internal data product sharing and marketplace

    Internal data product sharing and marketplace

    GenAI text-to-SQL

    GenAI text-to-SQL

    *

    Built-in natural language processing

    Built-in natural language processing

    *

    *

    24x7 Customer Support

    24x7 Customer Support

    Automated data optimization

    Automated data optimization

    *

    Federated Data Products

    Federated Data Products

    Automated data maintenance

    Automated data maintenance

    Simple pricing

    Simple pricing

    Easy to get started

    Easy to get started

    Comparison based on publicly available information as of July 1, 2024.

    * In preview. Contact us to learn more.

    Access

    True data access empowers data teams with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. Most cloud data warehouses require you to dump your source data into a lake before ingesting a portion of that data into the warehouse. Starburst eliminates the need for those extra pipelines by running directly on top of your object storage.

    Starburst Galaxy

    Snowflake

    Data Observability

    Data Observability

    Row filters and column masking

    Row filters and column masking

    Time-based access control policies

    Time-based access control policies

    SOC 2 Type 2 compliance and ISO 27001 certified

    SOC 2 Type 2 compliance and ISO 27001 certified

    Private Link for AWS, Azure, and Google

    Private Link for AWS, Azure, and Google

    *

    Near real-time data ingestion of streaming data

    Near real-time data ingestion of streaming data

    *

    On-premise data federation

    On-premise data federation

    RBAC

    RBAC

    ABAC

    ABAC

    Dynamic catalog

    Dynamic catalog

    Cross region analytics without data movement

    Cross region analytics without data movement

    Performant Cross-cloud analytics without data movement

    Performant Cross-cloud analytics without data movement

    Universal search and schema discovery

    Universal search and schema discovery

    Comparison based on publicly available information as of July 1, 2024.

    * In preview. Contact us to learn more.

    Scalability

    Today’s data teams need to manage performance and costs. Internet scale matters in an internet-powered world but not every workload needs the highest levels of power and performance – especially as costs go up at a faster rate than performance. Starburst saves customers anywhere from 50% to 75% on their cloud bills by helping customers transform and query data on their data lake without needing to push it all into a data warehouse.

    Starburst Galaxy

    Snowflake

    Interactive query performance

    Interactive query performance

    Consistently execute long-running batch queries

    Consistently execute long-running batch queries

    Fault Tolerant Execution

    Fault Tolerant Execution

    Materialized views

    Materialized views

    Customizable scaling for cost and performance optimization

    Customizable scaling for cost and performance optimization

    Price/performant SQL query engine**

    Price/performant SQL query engine**

    Smart indexing and caching (results and query) above baseline results and subquery caching

    Smart indexing and caching (results and query) above baseline results and subquery caching

    Autoscaling by adding/removing incremental nodes

    Autoscaling by adding/removing incremental nodes

    Comparison based on publicly available information as of July 1, 2024.

    * In preview. Contact us to learn more.

    ** See Concurrency Labs Benchmark report

    Optionality

    Open file and table formats are table stakes in providing optionality. Starburst helps you optimize your architecture (and your budget) by providing you with the ability to make deliberate decisions about where to run workloads (on the lake or in a warehouse) without added complexity. Optionality also means that your SQL scripts are easily transferable to new data architectures and platforms when you need them to move without requiring massive undertakings in conversion and rewriting.

    Starburst Galaxy

    Snowflake

    Runs on multiple clouds

    Runs on multiple clouds

    Supports popular open file formats

    Supports popular open file formats

    Supports Python

    Supports Python

    Supports first and third-party data catalogs

    Supports first and third-party data catalogs

    Standard ANSI SQL

    Standard ANSI SQL

    OSS MPP SQL query engine

    OSS MPP SQL query engine

    Supports Iceberg, Delta Lake, Hudi, and Hive table formats

    Supports Iceberg, Delta Lake, Hudi, and Hive table formats

    Supports Apache Ranger

    Supports Apache Ranger

    Comparison based on publicly available information as of July 1, 2024.

    * In preview. Contact us to learn more.

    Value across industries

    Retail & CPG

    BestSecret first implemented open source Trino to combat rising Snowflake costs, then turned to Starburst’s enterprise-grade solution.

    By deploying Starburst, BestSecret reduced costs by 70% and achieved a decentralized, zero ELT approach with the ability to federate across sources and analyze the data where it sits.

    Learn More

    Telecommunications

    Comcast built a hybrid analytics platform, powered by Starburst and Trino, to provide end users easy access to datasets across data warehouses, NoSQL databases, and data lakes.

    The platform pulls in 250-300 TBs of data daily, enabling real-time data exploration while reducing data warehouse spend, ETL, and labor costs.

    Learn More

    Financial Services

    doxo’s process of joining data from disparate sources to their data warehouse was time-consuming and labor intensive.

    By using Starburst as an abstraction layer, analysts are able to quickly query data in multiple databases and warehouses simultaneously without ETL, simplifying millions of daily transactions.

    Learn More

    Some additional exploration

    What is Starburst?

    Starburst Galaxy is a price-performant multi-cloud open data lakehouse powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for interactive ad-hoc analytics, long-running workloads like batch and ETL/ELT, streaming, automated data maintenance, and offers high scalability and query completion rates even as the amount of data (petabytes of data), query volume, and query complexity increases. Starburst runs federated queries across data lakes, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports enhanced fault-tolerant execution, smart indexing and caching, Data Products, machine learning (PyStarburst), universal search and schema discovery while truly separating compute and storage between Starburst and your cloud data lake.

    What is Snowflake?

    Snowflake is a data warehouse in the cloud brought to life in the Snowflake Data Cloud. It is built on top of Amazon Web Services and also runs on Microsoft Azure and Google Cloud. Snowflake offers a fully managed solution with no hardware or software needed to install, configure, or manage. Similar to Starburst, Snowflake powers multiple data workloads, from data warehousing, data Engineering, AI and machine learning, data applications, and cybersecurity across multiple cloud providers and regions from anywhere in the organization. Also, Snowflake separates compute and storage, but both are managed, billed, and executed within the Snowflake platform.

    What does the Snowflake data cloud do?

    At its core, Snowflake’s data cloud architecture is a data warehouse that is optimized for a cloud-based architecture. It allows the movement and transformation of data from storage (S3) to its proprietary and closed cloud data warehouse. This provides a holistic data cloud for elastic data management and consumption. Snowflake is not just a storage solution but a tool that provides performance, relational querying, security, and governance for data that resides within the confines of its digital walls. It can be used to consolidate structured and semi-structured data and power transformations, analytics, and reporting using a highly custom SQL.

    In addition to handling structured and semi-structured data, Snowflake recently announced support for unstructured data in a data warehouse architecture. By incorporating unstructured data, Snowflake also aims to expand into data science and machine learning use cases within a data warehouse architecture.

    Is Snowflake a cloud data platform?

    Yes, Snowflake is indeed a cloud data platform. It provides a data warehouse as a service designed for the cloud. This platform allows businesses to store and analyze data using cloud-based hardware and software with both storage and compute within the Snowflake account. With Snowflake, businesses can handle several aspects of data storage, including performance, scalability, and security, once the data is loaded into Snowflake’s proprietary and closed storage.

    Is Snowflake the same as AWS?

    The Snowflake platform is similar to a combination of AWS services but does not provide the full breadth of functionality of all of AWS combined. Consider Snowflake similar to combining Amazon S3, Amazon Redshift, Amazon Glue, Amazon EMR, and Amazon Athena, along with several other security, governance, and management services in a highly closed and propriety platform.

    How does Snowflake work?

    Snowflake operates a cloud data warehouse architecture. At the heart of this are virtual warehouses, which are essentially clusters of compute resources. In order to use Snowflake, there are multiple checkpoints. It’s typically recommended to first stage your data in a cloud data lake and then ingest it into Snowflake. Once your data is in Snowflake’s proprietary storage, you can transform the data within the platform for your analytical needs.

     

    Within the Snowflake cloud data warehouse, compute resources are separate from storage, meaning you can scale them independently based on your needs. This separation ensures that large queries won’t slow down smaller, more urgent ones and vice versa. It also means that you only pay for the compute resources you use. When queries require more nodes for processing, instead of adding more nodes to the existing cluster, Snowflake opts to add another cluster, doubling the total compute resources whether or not all the resources are needed.

    Data outside of Snowflake can be queried using External tables, unmanaged Iceberg Tables, and their propiartaary managed Iceberg Tables.

    Similar to Starburst, the Snowflake architecture is designed to be secure, fast, and easy to use.

    What are the benefits of the Snowflake data cloud?

    The Snowflake data cloud offers its users many benefits align with the approach taken by Starburst.io. For Snowflake customers, the ability to handle all types of data in one place simplifies data management and allows for more comprehensive analytics, a benefit that is also central to Starburst’s approach.

    Metadata handling is another significant advantage of Snowflake, with automatic collection and management of metadata making it easier for users to understand and use their data effectively. Similarly, Starburst also emphasizes the importance of metadata in its approach.

    The on-demand nature of Snowflake’s compute resources is a major advantage, allowing users to scale up or down instantly based on their needs. This mirrors Starburst’s emphasis on flexibility and cost-effectiveness.

    Finally, Snowflake’s proprietary architecture allows for high levels of concurrency, meaning multiple queries can be run simultaneously without affecting each other’s performance. This is a capability that is also highlighted in Starburst’s approach, with its platform built for analyzing large amounts of distributed data with high concurrency using an open-source foundation.

    In summary, both Snowflake and Starburst.io offer powerful solutions for data management with many similar benefits, including handling all types of data, effective metadata management, on-demand resources, and high levels of concurrency.

    What challenges exist with the Snowflake data cloud for SQL workloads?

    While Snowflake’s data cloud offers many benefits, it also presents certain challenges for SQL workloads. First off, Snowflake is difficult and excessively time-consuming to get started. It requires that all your data is first ingested into their closed and proprietary storage, which can take years. Furthermore, general Snowflake best practices, first have you move your data into a cloud data lake staging environment before it gets sucked up into Snowflake. This means you’re paying for data movement twice.

    For Snowflake customers, one of the key challenges is building effective data pipelines. The scalability of Snowflake can lead to the ingestion of excessive amounts of data, which can increase storage costs and potentially degrade the quality of the data.

    Another challenge is related to workload management. Snowflake’s multi-cluster architecture allows for high levels of concurrency, but it can also lead to increased complexity in managing and optimizing workloads. For instance, joining large tables of data directly from raw to presentation layers can cause workloads to run for hours and add significant costs to the warehouse.

    Next, while Snowflake provides a lot of automation, it still requires some manual intervention. For example, decisions around whether to use Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) need to be made. Additionally, while Snowflake automates many aspects of data management, it still requires DBA efforts.

    Customers of Snowflake of all sizes will also regularly cite costs going beyond forecasts very quickly as data volume increases, premium features are activated, and inefficient performance enhancement practices, i.e., doubling of clusters for autoscaling rather than adding the necessary amount of additional worker nodes to existing clusters.

    Lastly, Snowflake is a highly closed and proprietary platform that is not only difficult to get all your data into, but once in the platform, the proprietary storage and custom SQL make it extremely difficult and costly to offload workloads from Snowflake as data and query volumes grow.

    Is Snowflake ANSI SQL compliant?

    Yes, Snowflake is ANSI SQL compliant. This means that all of the most common operations are usable within Snowflake. Snowflake also supports all of the operations that enable data warehousing operations, like create, update, insert, etc. In addition to that, Snowflake supports a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. However, it’s important to note that though Snowflake is ANSI SQL compliant, Snowflake operates its own highly proprietary version of SQL built on top of ANSI standards with 100+ custom functions, making the scripts difficult to migrate out of Snowflake as volumes increase.

    How to do data sharing with Snowflake?

    Data sharing is limited to within the Snowflake ecosystem, which allows you to share selected objects in a database in your account with other Snowflake accounts. Here’s how you can do it:

    1. Create a Share: The provider creates a share of a database in their account and grants access to specific objects in the database. The provider can also share data from multiple databases, as long as these databases belong to the same account.
    2. Add Accounts to the Share: One or more accounts are then added to the share, which can include your own accounts (if you have multiple Snowflake accounts).
    3. Use Reader Accounts: If you want to share with people who don’t have Snowflake accounts, you can use Reader Accounts.

    In addition to the above, Snowflake offers other options for data sharing:

    • Listing: You can offer a listing privately to specific accounts, or publicly on the Snowflake Marketplace.
    • Direct Share: Use a Direct Share to share data with one or more accounts in the same Snowflake region.
    • Data Exchange: If creating listings that you offer privately to specific accounts isn’t an option, you can use a data exchange to share data with a selected group of accounts that you invite.

    These features enable an ecosystem of data sharing and collaboration exclusively within Snowflake. At a high platform cost, they allow for the creation of data applications and apps that leverage shared data for reporting.

    What are the layers of the Snowflake architecture?

    Snowflake cloud data warehouse consists of three core layers:

    1. Database Storage Layer: When data is loaded into Snowflake from your cloud data lake staging environment, it reorganizes that data into its proprietary internal optimized, compressed, columnar format that is built on top of a cloud data lake. Snowflake stores this optimized data in cloud storage. All aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are controlled by Snowflake.
    2. Query Processing Layer: This layer is responsible for executing SQL queries. Similar to Starburst and its use of OS Trino, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale benefits of a shared-nothing architecture.
    3. Cloud Services Layer: This layer provides services such as infrastructure management, metadata management, query parsing and optimization, access control, and more. It coordinates and handles all transactions and sessions, ensuring that all operations are secure and ACID-compliant.

    Beyond the three core components, there is also a built-in visualization component and data marketplace.

     

     

    Start Free with
    Starburst Galaxy

    Up to $500 in usage credits included

    • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
    • Get up and running in less than 5 minutes
    • Easily deploy clusters in AWS, Azure and Google Cloud
    For more deployment options:
    Download Starburst Enterprise

    Please fill in all required fields and ensure you are using a valid email address.