×
×

A complete comparison of Starburst and EMR

Discover how Starburst and Amazon EMR compare across platform access, scalability, simplicity, and optionality, including real customer reviews and G2 Crowd ratings.

Make your big data analytics easier. Not harder.

 

 

What is Starburst Galaxy?

Starburst Galaxy is a price-performant, fully-managed, multi-cloud data lake analytics platform powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for both interactive ad-hoc analytics and long-running workloads like batch and ETL/ELT, and offers high scalability and query completion rates even as the amount of data, query volume, and query complexity increases. Galaxy runs federated queries across the data lake, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports a wide range of business-critical capabilities for big data processing and analytics, such as fault-tolerant execution, smart indexing and caching, building, managing, and sharing of Data Products, machine learning (PyStarburst and integration with Ibis), cross-cloud/cross-region analytics, and universal search and schema discovery.

What is EMR Trino?

As one of over 200 AWS services, Amazon EMR, formerly known as Elastic MapReduce is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop, Apache Spark, PrestoSQL, and Trino on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon DynamoDB. PrestoSQL was renamed to Trino in December 2020. Amazon EMR versions 6.4.0 and later use the name Trino, while earlier release versions use the name PrestoSQL. Their Serverless option pivots on running big data applications on the Amazon Web Services Cloud using open source frameworks while letting Amazon EMR Serverless configure, optimize, secure, and manage clusters for their customers.

Starburst is a Leader in Big Data Processing and Distribution

Don’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said Starburst beat out EMR in all of these categories: 

  • Likelihood to Recommend
  • Product Going in Right Direction
  • Meets Requirements
  • Ease of Admin
  • Ease of Doing Business With
  • Quality of Support
  • Ease of Use
  • Average User Adoption
  • Estimated ROI

Simplicity

Going beyond key platform governance and management capabilities, a modern data analytics platform empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows you to use a range of existing investments in just a few clicks. It allows you to build federated data products from distributed data sets to support business use cases and create and scale self-service usage and adoption across the organization.

Starburst Galaxy

EMR Trino

Query sharing

Query sharing

Supports AWS Glue and other Data Catalogs

Supports AWS Glue and other Data Catalogs

Fully managed SaaS platform

Fully managed SaaS platform

Automated AWS compute plane set-up

Automated AWS compute plane set-up

Automated cluster management

Automated cluster management

Multi-cloud platform

Multi-cloud platform

Built-in data security

Built-in data security

Built-in real-time usage, monitoring, and reports

Built-in real-time usage, monitoring, and reports

Build-in data profiling

Build-in data profiling

Built-in data lineage

Built-in data lineage

Automated upgrades to the latest version of Trino

Automated upgrades to the latest version of Trino

In platform one-click client connectivity

In platform one-click client connectivity

Data Products

Data Products

Data Products sharing

Data Products sharing

*

GenAI text-to-SQL

GenAI text-to-SQL

*

Automated data lake optimization

Automated data lake optimization

Comparison based on publicly available information as of November 30, 2023

*In preview. Contact us to learn more.

Access

True data access empowers organizations with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. True access is about meeting business needs on time while adhering to regulatory data sovereignty requirements. Your modern data lake analytics platform/lakehouse should free your data sources for analytics purposes, not confine them in another way.

Starburst Galaxy

EMR Trino

Cloud and on-premises data federation

Cloud and on-premises data federation

Built-in end-to-end encryption

Built-in end-to-end encryption

RBAC/ABAC

RBAC/ABAC

AWS Service Account

AWS Service Account

AWS Lake Formation

AWS Lake Formation

Third-party access controls

Third-party access controls

Enhanced connectors for data access

Enhanced connectors for data access

Cross-cloud and cross-region analytics

Cross-cloud and cross-region analytics

In platform universal search and schema discovery

In platform universal search and schema discovery

SSO via AWS IAM, Okta, Azure AD, and Google

SSO via AWS IAM, Okta, Azure AD, and Google

Column masking and row-level filters

Column masking and row-level filters

Time-based policies

Time-based policies

Streaming ingest

Streaming ingest

*

Comparison based on publicly available information as of November 30, 2023

*In preview. Contact us to learn more.

Scalability

Internet scale matters in an internet-powered world but not every workload needs that power and performance. A modern data lake analytics platform puts the control in your hands to ensure high-performance scalability is available at a click of a button or automatically when you need it most while optimizing price-to-performance for all analytics workloads and maintaining confidence that queries will execute as scheduled.

Starburst Galaxy

EMR Trino

Ad-hoc and interactive queries

Ad-hoc and interactive queries

Graceful and idle shutdown

Graceful and idle shutdown

Consistently execute long-running batch queries

Consistently execute long-running batch queries

Automated scaling for cost and performance optimization

Automated scaling for cost and performance optimization

Automated resizing a running cluster

Automated resizing a running cluster

Autoscaling by nodes

Autoscaling by nodes

Automated cluster provisioning and sizing

Automated cluster provisioning and sizing

Complex expression pushdown on top of OS Trino

Complex expression pushdown on top of OS Trino

Enhanced Fault Tolerant Execution (FTE)

Enhanced Fault Tolerant Execution (FTE)

Smart indexing and caching

Smart indexing and caching

Materialized Views

Materialized Views

Parallel Connectors

Parallel Connectors

Results and repeated subquery caching

Results and repeated subquery caching

*

Comparison based on publicly available information as of November 30, 2023

*In preview. Contact us to learn more.

Optionality

Open file and table formats are table stakes in providing optionality. A modern data lake analytics platform goes beyond the fundamentals to ensure your business has full control over your data by accessing data where it lives, allowing choice in cloud providers, security, and BI tools, and ensuring expert Trino support is available if and when your teams need it most.

Starburst Galaxy

EMR Trino

Supports popular open table formats (Apache Iceberg, Delta Lake, Apache Hudi, and Apache Hive)

Supports popular open table formats (Apache Iceberg, Delta Lake, Apache Hudi, and Apache Hive)

Supports popular open file formats

Supports popular open file formats

OS Trino as query engine

OS Trino as query engine

Open source Trino Python client

Open source Trino Python client

Run on multiple clouds

Run on multiple clouds

Expert in-house Trino support

Expert in-house Trino support

Supports Python Dataframe API

Supports Python Dataframe API

*

Supports AWS Private Link, Azure Private Link, and Google Cloud Private Service Connect

Supports AWS Private Link, Azure Private Link, and Google Cloud Private Service Connect

*

Comparison based on publicly available information as of November 3o, 2023

* In preview. Contact us to learn more.

Contact us | Watch | Try

Access and analyze your data with elastic scale and high performance your business demands. Take Starburst Galaxy for a free test drive, watch the on-demand demo (no form fill needed), or contact us.

More resources

Analyst Report

Data Products for Dummies

Unlock the value in your data

Analyst Report

Gartner® Hype Cycle™ for Data Management 2023

Starburst has been recognized as a 2023 Gartner Hype Cycle Sample Vendor

Some additional exploration

Amazon EMR, Amazon Elastic MapReduce, or AWS EMR, which is it?

Formerly known as Amazon Elastic MapReduce, the official name of the service is Amazon EMR.

What are the benefits of EMR?

Amazon EMR offers a wide range of benefits for its customers, including elasticity, simple pricing, integration with other AWS services like AWS Data Pipeline, Amazon Cloudwatch, Amazon Redshift, EC2, Amazon VPC, Amazon Kinesis, and more. Use of APIs to programmatically manage your clusters. Also the ability to facilitate data transformation (ETL).

What are the challenges with EMR?

While Amazon EMR is a powerful tool, it does come with its own set of challenges:

  • Complexity: Amazon EMR can be complex to set up and manage, especially for users who are not familiar with the Hadoop ecosystem. This complexity can lead to increased time and resources spent on setup and management.
  • Automation: While Amazon EMR does provide some level of automation, it is not sufficient for all use cases. For example, data teams, more often than not, need to write custom scripts or use third-party tools to automate certain tasks that are automated out of the box in platforms like Starburst Galaxy.
  • On-Demand vs. Reserved Instances: Choosing the right instance type can be challenging, and the burden rests with the data team. On-demand instances offer flexibility but can be more expensive, while reserved instances require a longer-term commitment but can be more cost-effective.
  • Resource Intensive: Amazon EMR is resource-intensive, which can lead to increased costs across headcount and unnecessary spending on resource availability if not managed properly. Without built-in capabilities, users need to carefully monitor and manage their resource usage to avoid unnecessary costs.
  • Security: While Amazon EMR does provide robust security features, setting up and managing these features can be complex. Users need to carefully consider their security requirements and ensure they are correctly implemented.
  • Single cloud: The service only runs on AWS, so for multi-cloud environments, you now have to run multiple big data analytics platforms.
  • Basic support: Customers of EMR Trino only have access to standard AWS support and must rely on the open-source community to support their mission-critical applications.

What is the difference between EMR and Amazon Athena?

Also see how Starburst, which offers better performance, scale, optionality, governance, access, collaboration, sharing, and more, compares to Athena.

Amazon EMR and Amazon Athena are both AWS services that handle big data, but they do so in different ways.

  • Amazon EMR is a managed Apache Hadoop framework that supports other frameworks, including OS Trino and Presto, that process large amounts of data across Amazon EC2 instances. However, it can be more expensive compared to Athena and Starburst, especially when you are not processing any data.
  • Amazon Athena is an interactive query service (also built on top of OS Trino) that is best for ad-hock queries, which makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage.

While both services are designed to process big data, EMR provides a platform for open-source frameworks like Trino, Presto, Hadoop, and Apache Spark (each framework has its own version of EMR) and is ideal for complex, long-running jobs. In contrast, Athena is designed for quick, ad-hoc queries directly against data stored in S3. Starburst Galaxy, powered by Trino, is great for both.

What is the cost of EMR?

Amazon EMR pricing is considered simple. You pay a per-second rate for every second you use, with a one-minute minimum. Though the pricing for EMR may seem cost-effective, once you configure the full architecture to add in every other AWS service to set up a fully functional platform, the costs quickly begin to rise. This differs from platforms like Starburst, where the price is inclusive of all the capabilities from compute, access, governance, security, and more.

What is the difference between EMR and Amazon EC2?

Amazon EC2 (Elastic Compute Cloud) provides the raw compute capacity in the cloud (i.e., virtual machines); EMR is a service built on top of EC2 to process large amounts of data using big data frameworks. Both services only run on AWS.

What is the difference between EMR and Amazon Redshift?

Amazon EMR is a cloud service for big data processing using Amazon EC2 instances. On the other hand, Amazon Redshift is a cloud data warehouse service from AWS. Both services only run on AWS.

How are file systems used in Amazon EMR Trino?

In Amazon EMR, the EMR File System (EMRFS) is an implementation of HDFS that all Amazon EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like data encryption.

When it comes to Trino (formerly known as PrestoSQL) in Amazon EMR, it uses its own S3 filesystem for the URI prefixes s3://, s3n://, and s3a://. This allows Trino to read and write tables that are stored in Amazon S3 or S3-compatible systems. This is accomplished by having a table or database location that uses an S3 prefix rather than an HDFS prefix.

Contact Us to Learn More

We’ll send you a free download of Starburst, and a Starburst expert will reach out to schedule a call.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.