Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Use the input above to search.
Here are some suggestions:
Trino Summit is a two-day virtual conference on the 11th and 12th of December 2024. It's an event that brings together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino.
Learn moreInitially, Snowflake meets modern user expectations with features designed for high performance, ease of use, and guaranteed reliability. But as usage scales, costs skyrocket and team productivity suffers.
Majority of time spent on administrative tasks – tuning clusters and optimizing queries to manage costs
List of new projects and business asks grow as overall productivity of the data team slows down
Actual costs outpace estimates, placing team budgets under scrutiny
Starburst empowers data teams to quickly support new workloads in a cost-efficient manner by letting teams performantly query data in object storage. With Starburst running alongside your cloud data warehouse, you can choose which workloads to push into your data warehouse, adding openness and flexibility to your data architecture while reducing overall costs by 50-75%.
Don’t take our word for it. Starburst is named #1 for Quality of Support and Product Direction in G2’s Enterprise Grid Report based on real customer reviews. Additionally, customers said Starburst beat out Snowflake in all of these categories:
Going beyond platform governance and management capabilities, Starburst empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows teams to use a range of existing investments in just a few clicks. It helps to break down data silos and build federated data products from distributed data sets to support use cases and scale self-service usage and adoption across the organization.
Starburst Galaxy
Snowflake
Native data security
Native data security
Automated cluster management
Automated cluster management
Built-in real-time usage monitoring
Built-in real-time usage monitoring
Internal data product sharing and marketplace
Internal data product sharing and marketplace
GenAI text-to-SQL
GenAI text-to-SQL
*
Built-in natural language processing
Built-in natural language processing
*
*
24x7 Customer Support
24x7 Customer Support
Automated data optimization
Automated data optimization
*
Federated Data Products
Federated Data Products
Automated data maintenance
Automated data maintenance
Simple pricing
Simple pricing
Easy to get started
Easy to get started
Comparison based on publicly available information as of July 1, 2024.
* In preview. Contact us to learn more.
True data access empowers data teams with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. Most cloud data warehouses require you to dump your source data into a lake before ingesting a portion of that data into the warehouse. Starburst eliminates the need for those extra pipelines by running directly on top of your object storage.
Starburst Galaxy
Snowflake
Data Observability
Data Observability
Row filters and column masking
Row filters and column masking
Time-based access control policies
Time-based access control policies
SOC 2 Type 2 compliance and ISO 27001 certified
SOC 2 Type 2 compliance and ISO 27001 certified
Private Link for AWS, Azure, and Google
Private Link for AWS, Azure, and Google
*
Near real-time data ingestion of streaming data
Near real-time data ingestion of streaming data
*
On-premise data federation
On-premise data federation
RBAC
RBAC
ABAC
ABAC
Dynamic catalog
Dynamic catalog
Cross region analytics without data movement
Cross region analytics without data movement
Performant Cross-cloud analytics without data movement
Performant Cross-cloud analytics without data movement
Universal search and schema discovery
Universal search and schema discovery
Comparison based on publicly available information as of July 1, 2024.
* In preview. Contact us to learn more.
Today’s data teams need to manage performance and costs. Internet scale matters in an internet-powered world but not every workload needs the highest levels of power and performance – especially as costs go up at a faster rate than performance. Starburst saves customers anywhere from 50% to 75% on their cloud bills by helping customers transform and query data on their data lake without needing to push it all into a data warehouse.
Starburst Galaxy
Snowflake
Interactive query performance
Interactive query performance
Consistently execute long-running batch queries
Consistently execute long-running batch queries
Fault Tolerant Execution
Fault Tolerant Execution
Materialized views
Materialized views
Customizable scaling for cost and performance optimization
Customizable scaling for cost and performance optimization
Price/performant SQL query engine**
Price/performant SQL query engine**
Smart indexing and caching (results and query) above baseline results and subquery caching
Smart indexing and caching (results and query) above baseline results and subquery caching
Autoscaling by adding/removing incremental nodes
Autoscaling by adding/removing incremental nodes
Comparison based on publicly available information as of July 1, 2024.
* In preview. Contact us to learn more.
** See Concurrency Labs Benchmark report
Open file and table formats are table stakes in providing optionality. Starburst helps you optimize your architecture (and your budget) by providing you with the ability to make deliberate decisions about where to run workloads (on the lake or in a warehouse) without added complexity. Optionality also means that your SQL scripts are easily transferable to new data architectures and platforms when you need them to move without requiring massive undertakings in conversion and rewriting.
Starburst Galaxy
Snowflake
Runs on multiple clouds
Runs on multiple clouds
Supports popular open file formats
Supports popular open file formats
Supports Python
Supports Python
Supports first and third-party data catalogs
Supports first and third-party data catalogs
Standard ANSI SQL
Standard ANSI SQL
OSS MPP SQL query engine
OSS MPP SQL query engine
Supports Iceberg, Delta Lake, Hudi, and Hive table formats
Supports Iceberg, Delta Lake, Hudi, and Hive table formats
Supports Apache Ranger
Supports Apache Ranger
Comparison based on publicly available information as of July 1, 2024.
* In preview. Contact us to learn more.
BestSecret first implemented open source Trino to combat rising Snowflake costs, then turned to Starburst’s enterprise-grade solution.
By deploying Starburst, BestSecret reduced costs by 70% and achieved a decentralized, zero ELT approach with the ability to federate across sources and analyze the data where it sits.
Comcast built a hybrid analytics platform, powered by Starburst and Trino, to provide end users easy access to datasets across data warehouses, NoSQL databases, and data lakes.
The platform pulls in 250-300 TBs of data daily, enabling real-time data exploration while reducing data warehouse spend, ETL, and labor costs.
doxo’s process of joining data from disparate sources to their data warehouse was time-consuming and labor intensive.
By using Starburst as an abstraction layer, analysts are able to quickly query data in multiple databases and warehouses simultaneously without ETL, simplifying millions of daily transactions.
Starburst Galaxy is a price-performant multi-cloud open data lakehouse powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for interactive ad-hoc analytics, long-running workloads like batch and ETL/ELT, streaming, automated data maintenance, and offers high scalability and query completion rates even as the amount of data (petabytes of data), query volume, and query complexity increases. Starburst runs federated queries across data lakes, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports enhanced fault-tolerant execution, smart indexing and caching, Data Products, machine learning (PyStarburst), universal search and schema discovery while truly separating compute and storage between Starburst and your cloud data lake.
Snowflake is a data warehouse in the cloud brought to life in the Snowflake Data Cloud. It is built on top of Amazon Web Services and also runs on Microsoft Azure and Google Cloud. Snowflake offers a fully managed solution with no hardware or software needed to install, configure, or manage. Similar to Starburst, Snowflake powers multiple data workloads, from data warehousing, data Engineering, AI and machine learning, data applications, and cybersecurity across multiple cloud providers and regions from anywhere in the organization. Also, Snowflake separates compute and storage, but both are managed, billed, and executed within the Snowflake platform.
At its core, Snowflake’s data cloud architecture is a data warehouse that is optimized for a cloud-based architecture. It allows the movement and transformation of data from storage (S3) to its proprietary and closed cloud data warehouse. This provides a holistic data cloud for elastic data management and consumption. Snowflake is not just a storage solution but a tool that provides performance, relational querying, security, and governance for data that resides within the confines of its digital walls. It can be used to consolidate structured and semi-structured data and power transformations, analytics, and reporting using a highly custom SQL.
In addition to handling structured and semi-structured data, Snowflake recently announced support for unstructured data in a data warehouse architecture. By incorporating unstructured data, Snowflake also aims to expand into data science and machine learning use cases within a data warehouse architecture.
Yes, Snowflake is indeed a cloud data platform. It provides a data warehouse as a service designed for the cloud. This platform allows businesses to store and analyze data using cloud-based hardware and software with both storage and compute within the Snowflake account. With Snowflake, businesses can handle several aspects of data storage, including performance, scalability, and security, once the data is loaded into Snowflake’s proprietary and closed storage.
The Snowflake platform is similar to a combination of AWS services but does not provide the full breadth of functionality of all of AWS combined. Consider Snowflake similar to combining Amazon S3, Amazon Redshift, Amazon Glue, Amazon EMR, and Amazon Athena, along with several other security, governance, and management services in a highly closed and propriety platform.
Snowflake operates a cloud data warehouse architecture. At the heart of this are virtual warehouses, which are essentially clusters of compute resources. In order to use Snowflake, there are multiple checkpoints. It’s typically recommended to first stage your data in a cloud data lake and then ingest it into Snowflake. Once your data is in Snowflake’s proprietary storage, you can transform the data within the platform for your analytical needs.
Within the Snowflake cloud data warehouse, compute resources are separate from storage, meaning you can scale them independently based on your needs. This separation ensures that large queries won’t slow down smaller, more urgent ones and vice versa. It also means that you only pay for the compute resources you use. When queries require more nodes for processing, instead of adding more nodes to the existing cluster, Snowflake opts to add another cluster, doubling the total compute resources whether or not all the resources are needed.
Data outside of Snowflake can be queried using External tables, unmanaged Iceberg Tables, and their propiartaary managed Iceberg Tables.
Similar to Starburst, the Snowflake architecture is designed to be secure, fast, and easy to use.
The Snowflake data cloud offers its users many benefits align with the approach taken by Starburst.io. For Snowflake customers, the ability to handle all types of data in one place simplifies data management and allows for more comprehensive analytics, a benefit that is also central to Starburst’s approach.
Metadata handling is another significant advantage of Snowflake, with automatic collection and management of metadata making it easier for users to understand and use their data effectively. Similarly, Starburst also emphasizes the importance of metadata in its approach.
The on-demand nature of Snowflake’s compute resources is a major advantage, allowing users to scale up or down instantly based on their needs. This mirrors Starburst’s emphasis on flexibility and cost-effectiveness.
Finally, Snowflake’s proprietary architecture allows for high levels of concurrency, meaning multiple queries can be run simultaneously without affecting each other’s performance. This is a capability that is also highlighted in Starburst’s approach, with its platform built for analyzing large amounts of distributed data with high concurrency using an open-source foundation.
In summary, both Snowflake and Starburst.io offer powerful solutions for data management with many similar benefits, including handling all types of data, effective metadata management, on-demand resources, and high levels of concurrency.
While Snowflake’s data cloud offers many benefits, it also presents certain challenges for SQL workloads. First off, Snowflake is difficult and excessively time-consuming to get started. It requires that all your data is first ingested into their closed and proprietary storage, which can take years. Furthermore, general Snowflake best practices, first have you move your data into a cloud data lake staging environment before it gets sucked up into Snowflake. This means you’re paying for data movement twice.
For Snowflake customers, one of the key challenges is building effective data pipelines. The scalability of Snowflake can lead to the ingestion of excessive amounts of data, which can increase storage costs and potentially degrade the quality of the data.
Another challenge is related to workload management. Snowflake’s multi-cluster architecture allows for high levels of concurrency, but it can also lead to increased complexity in managing and optimizing workloads. For instance, joining large tables of data directly from raw to presentation layers can cause workloads to run for hours and add significant costs to the warehouse.
Next, while Snowflake provides a lot of automation, it still requires some manual intervention. For example, decisions around whether to use Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) need to be made. Additionally, while Snowflake automates many aspects of data management, it still requires DBA efforts.
Customers of Snowflake of all sizes will also regularly cite costs going beyond forecasts very quickly as data volume increases, premium features are activated, and inefficient performance enhancement practices, i.e., doubling of clusters for autoscaling rather than adding the necessary amount of additional worker nodes to existing clusters.
Lastly, Snowflake is a highly closed and proprietary platform that is not only difficult to get all your data into, but once in the platform, the proprietary storage and custom SQL make it extremely difficult and costly to offload workloads from Snowflake as data and query volumes grow.
Yes, Snowflake is ANSI SQL compliant. This means that all of the most common operations are usable within Snowflake. Snowflake also supports all of the operations that enable data warehousing operations, like create, update, insert, etc. In addition to that, Snowflake supports a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. However, it’s important to note that though Snowflake is ANSI SQL compliant, Snowflake operates its own highly proprietary version of SQL built on top of ANSI standards with 100+ custom functions, making the scripts difficult to migrate out of Snowflake as volumes increase.
Data sharing is limited to within the Snowflake ecosystem, which allows you to share selected objects in a database in your account with other Snowflake accounts. Here’s how you can do it:
In addition to the above, Snowflake offers other options for data sharing:
These features enable an ecosystem of data sharing and collaboration exclusively within Snowflake. At a high platform cost, they allow for the creation of data applications and apps that leverage shared data for reporting.
Snowflake cloud data warehouse consists of three core layers:
Beyond the three core components, there is also a built-in visualization component and data marketplace.
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included