Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Use the input above to search.
Here are some suggestions:
Trino Summit is a two-day virtual conference on the 11th and 12th of December 2024. It's an event that brings together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino.
Learn more
Starburst Galaxy is a price-performant, fully-managed, multi-cloud data lake analytics platform powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for both interactive ad-hoc analytics and long-running workloads like batch and ETL/ELT, and offers high scalability and query completion rates even as the amount of data, query volume, and query complexity increases. Galaxy runs federated queries across the data lake, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports a wide range of business-critical capabilities for big data processing and analytics, such as fault-tolerant execution, smart indexing and caching, building, managing, and sharing of Data Products, machine learning (PyStarburst and integration with Ibis), cross-cloud/cross-region analytics, and universal search and schema discovery.
As one of over 200 AWS services, Amazon EMR, formerly known as Elastic MapReduce is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop, Apache Spark, PrestoSQL, and Trino on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon DynamoDB. PrestoSQL was renamed to Trino in December 2020. Amazon EMR versions 6.4.0 and later use the name Trino, while earlier release versions use the name PrestoSQL. Their Serverless option pivots on running big data applications on the Amazon Web Services Cloud using open source frameworks while letting Amazon EMR Serverless configure, optimize, secure, and manage clusters for their customers.
“We stored all our data in Amazon S3 and initially used EMR for querying. But as time went on, the demand for insights became unsustainable. That’s when we transitioned to Starburst. Now we’re experiencing faster insights, improved security, and the added benefit of 24/7 support. It has truly elevated our data analytics capabilities.”
— Head of Data Analytics, Leading Business Communication Platform
Learn more
“Starburst resolved data access complexities that Amazon EMR couldn’t and delivered unparalleled performance. With a single platform, we reduced time to analytics, streamlined operations, and achieved significant cost savings.”
— Head of Data Science, Fortune 100 Pharmaceutical Company
“We switched from EMR to Starburst because we needed a solution that would meet our low compliance, high-availability, and auto-scaling requirements. Now we’ve been able to decrease our query response time dramatically, which in turn makes our data analysts and engineers more productive.”
— Director of Data Engineering, American Automotive Company
Don’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said Starburst beat out EMR in all of these categories:
Going beyond key platform governance and management capabilities, a modern data analytics platform empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows you to use a range of existing investments in just a few clicks. It allows you to build federated data products from distributed data sets to support business use cases and create and scale self-service usage and adoption across the organization.
Starburst Galaxy
EMR Trino
Query sharing
Query sharing
Supports AWS Glue and other Data Catalogs
Supports AWS Glue and other Data Catalogs
Fully managed SaaS platform
Fully managed SaaS platform
Automated AWS compute plane set-up
Automated AWS compute plane set-up
Automated cluster management
Automated cluster management
Multi-cloud platform
Multi-cloud platform
Built-in data security
Built-in data security
Built-in real-time usage, monitoring, and reports
Built-in real-time usage, monitoring, and reports
Build-in data profiling
Build-in data profiling
Built-in data lineage
Built-in data lineage
Automated upgrades to the latest version of Trino
Automated upgrades to the latest version of Trino
In platform one-click client connectivity
In platform one-click client connectivity
Data Products
Data Products
Data Products sharing
Data Products sharing
*
GenAI text-to-SQL
GenAI text-to-SQL
*
Automated data lake optimization
Automated data lake optimization
Comparison based on publicly available information as of November 30, 2023
*In preview. Contact us to learn more.
True data access empowers organizations with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. True access is about meeting business needs on time while adhering to regulatory data sovereignty requirements. Your modern data lake analytics platform/lakehouse should free your data sources for analytics purposes, not confine them in another way.
Starburst Galaxy
EMR Trino
Cloud and on-premises data federation
Cloud and on-premises data federation
Built-in end-to-end encryption
Built-in end-to-end encryption
RBAC/ABAC
RBAC/ABAC
AWS Service Account
AWS Service Account
AWS Lake Formation
AWS Lake Formation
Third-party access controls
Third-party access controls
Enhanced connectors for data access
Enhanced connectors for data access
Cross-cloud and cross-region analytics
Cross-cloud and cross-region analytics
In platform universal search and schema discovery
In platform universal search and schema discovery
SSO via AWS IAM, Okta, Azure AD, and Google
SSO via AWS IAM, Okta, Azure AD, and Google
Column masking and row-level filters
Column masking and row-level filters
Time-based policies
Time-based policies
Streaming ingest
Streaming ingest
*
Comparison based on publicly available information as of November 30, 2023
*In preview. Contact us to learn more.
Internet scale matters in an internet-powered world but not every workload needs that power and performance. A modern data lake analytics platform puts the control in your hands to ensure high-performance scalability is available at a click of a button or automatically when you need it most while optimizing price-to-performance for all analytics workloads and maintaining confidence that queries will execute as scheduled.
Starburst Galaxy
EMR Trino
Ad-hoc and interactive queries
Ad-hoc and interactive queries
Graceful and idle shutdown
Graceful and idle shutdown
Consistently execute long-running batch queries
Consistently execute long-running batch queries
Automated scaling for cost and performance optimization
Automated scaling for cost and performance optimization
Automated resizing a running cluster
Automated resizing a running cluster
Autoscaling by nodes
Autoscaling by nodes
Automated cluster provisioning and sizing
Automated cluster provisioning and sizing
Complex expression pushdown on top of OS Trino
Complex expression pushdown on top of OS Trino
Enhanced Fault Tolerant Execution (FTE)
Enhanced Fault Tolerant Execution (FTE)
Smart indexing and caching
Smart indexing and caching
Materialized Views
Materialized Views
Parallel Connectors
Parallel Connectors
Results and repeated subquery caching
Results and repeated subquery caching
*
Comparison based on publicly available information as of November 30, 2023
*In preview. Contact us to learn more.
Open file and table formats are table stakes in providing optionality. A modern data lake analytics platform goes beyond the fundamentals to ensure your business has full control over your data by accessing data where it lives, allowing choice in cloud providers, security, and BI tools, and ensuring expert Trino support is available if and when your teams need it most.
Starburst Galaxy
EMR Trino
Supports popular open table formats (Apache Iceberg, Delta Lake, Apache Hudi, and Apache Hive)
Supports popular open table formats (Apache Iceberg, Delta Lake, Apache Hudi, and Apache Hive)
Supports popular open file formats
Supports popular open file formats
OS Trino as query engine
OS Trino as query engine
Open source Trino Python client
Open source Trino Python client
Run on multiple clouds
Run on multiple clouds
Expert in-house Trino support
Expert in-house Trino support
Supports Python Dataframe API
Supports Python Dataframe API
*
Supports AWS Private Link, Azure Private Link, and Google Cloud Private Service Connect
Supports AWS Private Link, Azure Private Link, and Google Cloud Private Service Connect
*
Comparison based on publicly available information as of November 3o, 2023
* In preview. Contact us to learn more.
Access and analyze your data with elastic scale and high performance your business demands. Take Starburst Galaxy for a free test drive, watch the on-demand demo (no form fill needed), or contact us.
Analyst Report
Unlock the value in your data
Analyst Report
Starburst has been recognized as a 2023 Gartner Hype Cycle Sample Vendor
Formerly known as Amazon Elastic MapReduce, the official name of the service is Amazon EMR.
Amazon EMR offers a wide range of benefits for its customers, including elasticity, simple pricing, integration with other AWS services like AWS Data Pipeline, Amazon Cloudwatch, Amazon Redshift, EC2, Amazon VPC, Amazon Kinesis, and more. Use of APIs to programmatically manage your clusters. Also the ability to facilitate data transformation (ETL).
While Amazon EMR is a powerful tool, it does come with its own set of challenges:
Also see how Starburst, which offers better performance, scale, optionality, governance, access, collaboration, sharing, and more, compares to Athena.
Amazon EMR and Amazon Athena are both AWS services that handle big data, but they do so in different ways.
While both services are designed to process big data, EMR provides a platform for open-source frameworks like Trino, Presto, Hadoop, and Apache Spark (each framework has its own version of EMR) and is ideal for complex, long-running jobs. In contrast, Athena is designed for quick, ad-hoc queries directly against data stored in S3. Starburst Galaxy, powered by Trino, is great for both.
Amazon EMR pricing is considered simple. You pay a per-second rate for every second you use, with a one-minute minimum. Though the pricing for EMR may seem cost-effective, once you configure the full architecture to add in every other AWS service to set up a fully functional platform, the costs quickly begin to rise. This differs from platforms like Starburst, where the price is inclusive of all the capabilities from compute, access, governance, security, and more.
Amazon EC2 (Elastic Compute Cloud) provides the raw compute capacity in the cloud (i.e., virtual machines); EMR is a service built on top of EC2 to process large amounts of data using big data frameworks. Both services only run on AWS.
Amazon EMR is a cloud service for big data processing using Amazon EC2 instances. On the other hand, Amazon Redshift is a cloud data warehouse service from AWS. Both services only run on AWS.
In Amazon EMR, the EMR File System (EMRFS) is an implementation of HDFS that all Amazon EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like data encryption.
When it comes to Trino (formerly known as PrestoSQL) in Amazon EMR, it uses its own S3 filesystem for the URI prefixes s3://, s3n://, and s3a://. This allows Trino to read and write tables that are stored in Amazon S3 or S3-compatible systems. This is accomplished by having a table or database location that uses an S3 prefix rather than an HDFS prefix.
We’ll send you a free download of Starburst, and a Starburst expert will reach out to schedule a call.
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included