What’s next for Trino
Lester Martin
Educational Engineer
Starburst
Lester Martin
Educational Engineer
Starburst
Share
More deployment options
It seems like only yesterday that Trino celebrated being around for a decade. Born out of Facebook to address the need for improved performance and scalability, Trino has become a household name among companies large and small. Nearly twelve years later, the third annual Trino Fest has wrapped, and the project is flourishing. So the natural next question is to ask what’s next for Trino? While we can’t predict the future, there is no doubt that Trino’s impact has cemented itself as a future staple of data lakes and data lakehouses alike.
We should talk a bit more about the history, but mainly to help us think about the future, as historical advancements can help act as a predictor of what’s to come. Trino will continue to be a first-class query engine as the data lake analytics community becomes more and more focused on being open for the benefit of users. This open data lake mindset has always been the heart of Trino.
Major open-source activities
I crawled the Release notes — Trino Documentation page to identify some major features and accomplishments delivered from 2020 onward in open-source Trino. It is quite a list of achievements and improvements. Martin Traverso shared some of these achievements in his Trino Fest keynote, what’s new in Trino this summer, while also mentioning that Trino is on pace to complete its most active development year yet.
- Kubernetes (k8s) deployment brought another installation option available and provides configuration-based autoscaling – Release 399 (6 Oct 2022)
- Modern table format catalogs allowed users to go beyond Hive for their data lake tables
- Apache Iceberg – Release 341 (8 Sep 2020)
- Delta Lake – Release 373 (9 Mar 2022)
- Apache Hudi – Release 398 (28 Sep 2022)
- Project Tardigrade enabled stage-level checkpointing to allow a cluster to be configured for fault-tolerant execution; ideal for long-running jobs – Release 374 (17 Mar 2022)
- The arrival of Trino Gateway gave us a load balancer, proxy server, and configurable routing gateway for multiple Trino clusters – Trino Gateway 3 (26 Sep 2023)
- SQL routines encouraged users to surface their own custom user-defined functions written in SQL – Release 431 (27 Oct 2023)
- Dynamic catalogs feature allows catalogs to be added without requiring restarting – Release 433 (10 Nov 2023)
- Performance improvements became available with the revamp of file system caching – Release 439 (15 Feb 2024)
Snowflake connector added to access this popular cloud data warehouse – Release 440 (8 Mar 2024)
In addition to the big rocks, there have been lots of efforts to improve performance, specifically in the lakehouse. These advancements were on full display at Trino Fest as four of the talks focused on modern table formats like Apache Iceberg, Delta Lake, and Hudi – strengthening Trino’s place as the query engine of choice for the data lakehouse. Furthermore, the Icehouse phenomenon continues to be front and center as Amit Gilad from Cloudinary discussed best practices for migrating to Trino and Apache Iceberg.
Starburst and Trino together
The original creators of Trino (Martin Traverso, Dain Sundstrom, David Phillips, and Eric Hwang) work at Starburst and continue to directly participate in the direction and future of Trino, alongside the rest of the community of users, contributors, and maintainers.
Open-source Trino is the backbone of Starburst Galaxy and Starburst Enterprise which have additional value-added features enumerated in our Comparison of Starburst & Trino focused on performance, scalability, simplicity, and security.
It should be clearly noted that Starburst does not own Trino and is not the only organization working on Trino. The Trino Community is vast and diverse, as all open-source projects aspire to be. Furthermore, Trino guidelines for participants with corporate interests exist and there are other Trino-based offerings in the marketplace.
What’s next for Trino
With the focus on modern table formats in general, Apache Iceberg in particular, and The Icehouse Manifesto: Building an Open Lakehouse (essentially Trino as the compute engine and Apache Iceberg as the table format), Trino will have even more attention than ever before. This means more users and more clusters. This also means more engineering efforts to continue to grow.
We here at Starburst do not get to dictate a formal roadmap for Trino itself, but we have the unique ability to contribute toward it based on our heavy involvement in the Trino Community. With the never-ending focus on being a query engine that runs at ludicrous speed and the optionality of accessing many Data sources, it is no surprise that many of the features coming are focused on these two key areas.
Starburst will continue to enhance our own value-added features for competitive reasons, but history shows us that this effort will continue to help Trino itself by the desire to push many engine-focused improvements for performance, scalability, and reliability ultimately making their way into open-source. Additionally, Starburst customers are ultimately Trino users, and that ever-expanding customer base puts more and more validation on our favorite query engine.
For the core engine itself, this author would love to see a native HA Coordinator someday and the flexibility for Trino to be able to determine the best number of splits at future stages of the query plan would be very beneficial. There will always be a focus on improving Cost-based optimizations to make the engine even more ludicrously faster. Existing connectors will be refined and new ones will surface as the data technology landscape grows.
I expect to see Trino being used more for workloads beyond high-performance querying. With fault-tolerant execution enabled and understanding the vast majority of transformation processing can be done with SQL, I expect to see even more users building their data pipelines to execute in Trino instead of on separate compute frameworks and engines. This consolidation of runtime activities will create a more streamlined infrastructure stack all while reducing code complexity & operational costs. This hypothesis is supported by the Executive Homes talk at Trino Fest, where the dedicated data science team of two demonstrates consolidating and managing data using Trino as the backbone of their data infrastructure.
Recent events around major cloud data warehouse and lakehouse vendors embracing Iceberg and offering an open and accessible catalog present the biggest opportunities for Trino to grow in popularity and usage. Allowing customers to store their data on their data lake and allowing multiple engines to access the same tables is immensely significant. The cost-performance benefits of Trino against other popular processing engines ensure even more future growth.
Want to learn more about Trino’s future? Maybe even influence it yourself? Then join the Trino Community and consider participating in any Role of, and possibly even Contributing to, the project directly. The power of open-source is in the open community. Trino is for all of us.