Starburst Elements: What is Trino?

  • Brian Olsen

    Brian Olsen

    Developer Advocate

    Starburst

Share

Data engineers are struggling to keep up with the demands of their data consumers. Every team has their favorite database object storage or other systems. And the businesses are continually pushing for more complex analysis on new data sources. This leads to a complex and expensive data architecture that requires engineers to move and copy data ultimately creating a perpetual waiting game for business users and delaying critical decision making.

Trino is an open source distributed SQL engine for running fast analytic queries against various data sources ranging in size from gigabytes to petabytes. Trino was designed and built from scratch for interactive analytics. It approaches the speed of commercial data warehouses while scaling to the size of very large organizations.

 

 

In the Fall of 2012, a small team of four engineers at Facebook, Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang, started working on Presto. By spring 2013, the first version was successfully rolled out within Facebook. Later that year, Facebook open sourced Presto under the Apache License. In 2018 Martin, Dain, and David left Facebook to pursue building an open source community full-time, under the existing name of Presto but using the PrestoSQL moniker to distinguish it from the original project.

In December 2020, PrestoSQL rebranded as Trino. Trino (formerly PrestoSQL) brings the value of Trino to a broad array of companies in varying stages of cloud adoption who need faster access to all of their data. Companies like LinkedIn, Lyft, Netflix, GrubHub, Slack, Comcast, FINRA, Condé Nast, Nordstrom and thousands of others use Trino today.

Trino is not a database with storage, rather, it simply queries data where it lives. When using Trino, storage and compute are decoupled and can be scaled independently. Trino represents the compute layer, whereas the underlying data sources represent the storage layer. You’ll also gain greater cost control through separating compute and storage and you can easily connect your team’s analytics tool of choice.

This allows Trino to scale up and down its compute resources for query processing, based on analytics demand to access this data. There is no need to move your data, and provision compute and storage to the exact needs of the current queries, or change that regularly, based on your changing query needs.

Trino can scale the speed of queries by scaling the compute cluster dynamically. This characteristic allows you to greatly optimize your hardware resource needs and therefore reduce cost.

Whether you need faster query performance on a data lake or to combine data across sources, Trino is the fastest distributed SQL query engine. It can even federate queries across your big data store, your relational store and others. You can query all your data, regardless of where it’s stored in the cloud or on premises. Trino makes it possible for data analysts and data scientists to quickly access their data. The wait is over.

With in-person events coming back, we hope you’ll join us at our first ever hybrid Trino Summit on October 13, 2021 at San Francisco’s Commonwealth Club. Register here to learn more. The Call for Papers deadline is August 15.

Want to learn more about Trino? Tune into our Trino Community Broadcast every other Thursday. You can catch the next one on July 22 at 11am ET here. Want to join our Trino community? Come chat with us on Slack here.