×
×

Starburst vs. Athena

Starburst Galaxy offers up to 10.2x faster SQL at a fraction of the cost while providing a simple, open, and highly scalable end-to-end data and analytics platform to power your open data lakehouse.

What is Starburst Galaxy?

Starburst Galaxy is a price-performant, fully managed, multi-cloud data and analytics platform powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for both interactive ad-hoc analytics and long-running workloads like batch and ETL/ELT, and offers high scalability and query completion rates even as the amount of data, query volume, and query complexity increases. The service runs federated queries across data lakes, cloud data warehouses, on-premises databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports fault-tolerant execution, smart indexing and caching, Data Products, and universal search and schema discovery.

What is Amazon Athena?

Amazon Athena, available in serverless and dedicated versions, is a query service that analyzes data in Amazon Web Services (primarily Amazon S3) using standard SQL for ad-hoc analytics. Amazon Athena serverless has no infrastructure for customers to manage, and they only pay for queries that run. Amazon Athena was originally built on a fork of Presto (PrestoDB version .217), originally released in January 2019.

Starburst is a Leader in Enterprise Big Data Analytics

Don’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said this about Starburst: 

  • 100% of users rated Starburst 4+ stars
  • 100% of users believe Starburst is headed in the right direction
  • 96% meets users requirements
  • 93% of users would recommend

Simplicity

Going beyond key platform governance and management capabilities, a modern data and analytics platform empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows businesses to use a range of existing investments in just a few clicks. It enables data teams to easily discover, create, govern, analyze and share federated data products from distributed data sets across the organization.

Starburst Galaxy

Amazon Athena (Serverless)

Automated AWS compute plane set-up

Automated AWS compute plane set-up

Automated data maintenance

Automated data maintenance

Multi-cloud platform

Multi-cloud platform

Built-in data security

Built-in data security

Data Products

Data Products

Automated cluster management

Automated cluster management

Built-in real-time usage monitoring

Built-in real-time usage monitoring

Built-in query scheduler

Built-in query scheduler

Built-in Natural Language Processing

Built-in Natural Language Processing

*

Automated data lake optimization

Automated data lake optimization

Predictable pricing

Predictable pricing

Comparison based on publicly available information as of July 8, 2024.
* In preview. Contact us to learn more.

Access

True data access empowers organizations with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. True access is about meeting business needs on time while adhering to regulatory data sovereignty requirements. Your open lakehouse should free your data sources for analytics and AI, not confine them in another way.

Starburst Galaxy

Amazon Athena (Serverles)

Cloud data federation

Cloud data federation

On-premise data federation

On-premise data federation

AWS service account

AWS service account

Time-based policies

Time-based policies

RBAC

RBAC

ABAC

ABAC

Column/Row masking

Column/Row masking

SSO via AWS IAM, Okta, Azure AD, and Google

SSO via AWS IAM, Okta, Azure AD, and Google

Universal Search and schema discovery

Universal Search and schema discovery

Uses Trino connectors for federation

Uses Trino connectors for federation

In platform universal search and schema discovery

In platform universal search and schema discovery

Optimized first party connectors - parallelism, cached views, dynamic filtering, security, and authentication

Optimized first party connectors - parallelism, cached views, dynamic filtering, security, and authentication

Query sharing

Query sharing

Data Products sharing

Data Products sharing

Data profiling

Data profiling

Data lineage

Data lineage

Streaming ingest

Streaming ingest

*

Comparison based on publicly available information as of July 8, 2024.

* In preview. Contact us to learn more.

Scalability

Internet scale matters in an internet-powered world but not every workload needs that power and performance. Your open data lakehouse, p0wers modern data and analytics and puts control of performance and costs in your hands. It ensures high-performance scalability is available at a click of a button or automatically when you need it most while optimizing price-to-performance for all analytics workloads. It also instills confidences that queries will execute as scheduled, even at high concurrencies.

Starburst Galaxy

Amazon Athena (Serverless)

Works with S3 Express One Zone

Works with S3 Express One Zone

Ad-hoc and interactive queries

Ad-hoc and interactive queries

Results and repeated subquery caching

Results and repeated subquery caching

*

High concurrency

High concurrency

Control over concurrency and prioritization

Control over concurrency and prioritization

Fault Tolerant Execution

Fault Tolerant Execution

Built-in data catalog

Built-in data catalog

Autoscales by adding more nodes per cluster

Autoscales by adding more nodes per cluster

Customizable scaling for cost and performance optimization

Customizable scaling for cost and performance optimization

Consistently executes long-running batch queries

Consistently executes long-running batch queries

Smart indexing and caching

Smart indexing and caching

Fine-grained resource management

Fine-grained resource management

Comparison based on publicly available information as of July 8, 2024.

* In preview. Contact us to learn more.

Optionality

Open file and table formats are table stakes in providing optionality. Your open lakehouse goes beyond the fundamentals to ensure your business has full control over your data by accessing data where it lives across hybrid and multi-cloud data architectures, by allowing choice in cloud providers, security, and BI tools, and ensuring expert Trino support is available if and when your teams need it most.

Starburst Galaxy

Amazon Athena (Serverles)

OS Trino query engine

OS Trino query engine

Supports popular open file formats

Supports popular open file formats

Supports Python

Supports Python

Supports hybrid and cloud data architectures

Supports hybrid and cloud data architectures

Supports data catalogs beyond AWS Glue

Supports data catalogs beyond AWS Glue

Runs on multiple clouds

Runs on multiple clouds

Expert in-house Trino support

Expert in-house Trino support

Natively run SQL on Iceberg, Delta Lake, Hudi, and Hive table formats

Natively run SQL on Iceberg, Delta Lake, Hudi, and Hive table formats

In platform capability to migrate Hive to Delta or Iceberg tables

In platform capability to migrate Hive to Delta or Iceberg tables

Comparison based on publicly available information as of July 8, 2024.

* In preview. Contact us to learn more.

Free test drive | Watch | Contact us

Access and analyze your data with elastic scale and high performance your business demands. Take Starburst Galaxy for a free test drive, watch the on-demand demo (no form fill needed), or contact us.

Some additional exploration

What is Amazon Athena Serverless used for?

Amazon Athena is AWS’s analytics engine that allows you to execute Athena queries terabytes and petabytes of data in and around S3. You can use Athena to execute data warehouse-like SQL queries on data in your lake, access data from federated sources, prepare data for machine learning models, build distributed data reconciliation engines, and perform multi-cloud data analysis while only being able to run on Amazon Web Services.

Is Amazon Athena an ETL tool?

Amazon Athena is not an ETL tool in the traditional sense, but it can be used to simplify ETL data pipelines using its federated SQL queries and user-defined functions. However, it is not uncommon for long-running queries like ETL jobs to fail without warning.

Are Amazon Athena and Amazon S3 the same?

No, they are not the same. Amazon Simple Storage Service (S3) is a cloud storage service that allows you to store and retrieve data stored within it (cloud data lake). Amazon Athena, on the other hand, is what you would use to run queries against S3 data using standard SQL that supports ANSI SQL.

Is Amazon Athena a SQL Server?

No, Amazon Athena is not a SQL server. It is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena is built on top of Presto, a distributed SQL query engine, and can process large amounts of data in parallel. It supports a wide range of data formats, including CSV, JSON, ORC, Avro, and Parquet.

Can I use Amazon Athena as a data warehouse?

Athena allows you to create a ‘data warehouse’ like experience on Amazon S3. By defining schemas and running queries, you can efficiently organize and get your data. Furthermore, using APIs from AWS, visualization of the query results in business intelligence tools becomes possible.

What other AWS services do I need to use in conjunction with Amazon Athena?

To effectively use and manage Amazon Athena, instead of having the native built-in functionality you would expect of an analytics platform, Amazon Athena requires you to use several other AWS services. Here are some of the services that are required to make Athena work effectively.

Amazon S3 – Amazon S3 (Simple Storage Service) is an object storage service that offers scalability, data availability, security, and performance. It allows you to store and retrieve any amount of data from anywhere on the web. You can accomplish these tasks using the simple and intuitive web interface of the AWS Management Console. Data is stored in S3 buckets, which are containers for objects that you store in Amazon S3. S3 is the primary data source for Amazon Athena.

AWS Glue – this is the primary data catalog for Amazon Athena. The Glue data catalog is a fully managed, serverless data integration service that makes it easy to prepare and load data for analytics. Starburst Galaxy provides options in which data catalogs you use, including Starburst Gravity and AWS Glue.

Amazon Lake Formation – is a fully managed service that makes it easy to build, secure, and manage data lakes. Unlike Starburst Galaxy, which has these security and governance capabilities built in, Amazon Athena requires you to use Amazon Lake Formation to define and enforce database, table, and column-level access policies when using Athena queries to read data stored in Amazon S3

Amazon Redshift – this is AWS’s data warehouse. Similar to Starburst Galaxy, Amazon Athena also allows you to access data within Amazon Redshift.

AWS Lambda – You can use Lambda to execute SQL queries on Amazon Athena. You can create a Lambda function that uses the AWS SDK for Python (Boto3) to execute SQL queries on Amazon Athena. The Lambda function can be triggered by an event, such as an API Gateway invocation, to execute the query and return the query results.

Amazon DynamoDB – this is a fully managed NoSQL database service. The Amazon Athena DynamoDB connector (also available in the Starburst self-managed software offering) enables Athena to communicate with DynamoDB so that you can execute SQL queries on your tables.

AWS IAM – this is the identity and assessment service from AWS. Unlike the built-in capabilities with Starburst Galaxy, Amazon Athena uses AWS IAM policies as the primary means to restrict access to Athena operations. Users can create policies that grant or deny access to specific resources and configure permissions based on user roles or groups.

AWS Command Line Interface (CLI) – You can use the AWS CLI to interact with Amazon Athena. For example, you can use the aws athena start-query-execution command to run a query. You will then need to poll with aws athena get-query-execution until the query is finished. When that is the case, the result of that call will also contain the location of the query result on S3, which you can then download with aws s3 cp.

You cannot save results from the AWS CLI directly, but you can specify a Query Result Location, and Amazon Athena will automatically save a copy of the query results in an Amazon S3 location that you specify. You could then use the AWS CLI to download that results file.

Amazon Quicksight – this is AWS’s business intelligence (BI) service. Similar to the Starburst Galaxy and Quicksight experience, Amazon QuickSight retrieves data from Athena to enable visualization of the query results from Amazon Athena SQL queries.

Amazon EMR, formerly known as Amazon Elastic MapReduce – is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. You would use the two together when building an Apache Iceberg data lake. You can use Amazon EMR Spark to create an Iceberg table and load sample data, then use Athena to query the table, perform schema evolution, and more with the AWS Glue Data Catalog.

Can I create data products using Amazon Athena?

Unlike Starburst, Amazon Athena does not offer capabilities to build, manage, secure, and share curated data sets in the form of federated data products.

What is the difference between Serverless and Dedicated in Amazon Athena?

Amazon Athena is serverless, so there is no infrastructure to manage, and its pricing structure is you pay only for the queries you run.

Amazon Athena also recently introduced the ability to provision dedicated capacity for your Athena queries. With provisioned capacity, you can reserve a dedicated set of compute resources to run your queries. This puts the management responsibilities on customers creating high risks of wasted resources and rising costs with poor management. 

Start for Free with Starburst Galaxy

Up to $500 in usage credits included

You will need a valid email in order to activate your free trial.

Please fill in all required fields and ensure you are using a valid email address.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.