Watch Trino Summit 2024 on-demand!
Trino Summit 2024 was held on December 11-12. As the biggest Trino event of the year, we brought together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino. Check out all the sessions on-demand and hear from companies such as Netflix, LinkedIn, Prezi, Wise, and more!
Agenda
<h3>Enduring with persistence to reach the summit</h3>
In the keynote Martin presents the latest and greatest news from the Trino project and the Trino community. With more contributors, more maintainers, and a larger community we got a lot done since Trino Fest in June. Find out the details from the co-creator of Trino.
<h3>Running Trino at exabyte-scale data warehouse</h3>
Netflix operates over 15+ Trino clusters, efficiently handling more than 10 million queries each month. As the initial creator of the Apache Iceberg, Netflix has over 1 million Iceberg tables extensively using Trino-Iceberg connector. In this session we will talk about the operational challenges faced, internal efficiency improvements and experience with upgrading to the latest Trino version.
<h3>Data Lake at Wise powered by Trino and Iceberg</h3>
Wise Plc. (https://wise.com) is on a journey to build a modern data lake using an open-source data stack. Known for its global money transfer services, Wise is in the process of migrating over 50 trillion records, amounting to more than 3 petabytes of raw data, collected from hundreds of services and near-realtime streams into this new open data lake architecture. The new infrastructure supports the ingestion of terabytes of new data and handles millions of queries daily. At the heart of this architecture are Trino and Iceberg, with Trino playing a key role in optimizing Iceberg tables, powering data transformations through DBT, and delivering fast, scalable analytics for ad-hoc querying and dashboarding. Trino's ability to process large-scale data with distributed querying makes it not only performant but also cost-effective, allowing Wise to scale its data operations efficiently without the heavy overhead costs typically associated with traditional data warehouses.
<h3>Using Trino as a Strangler Fig</h3>
This talk will discuss how FanDuel uses Trino to migrate analysts from Redshift to Delta Lake using Martin Fowler's Strangler Fig pattern. Trino slowly took roots after initial trails, starts replacing parts of the legacy system, and eventually will be a complete replacement with a shadow of the original system.
<h3>A Lakehouse that simply works</h3>
With the billions of tech and vendors proposal, it's easy to loose track of what truly matters. I would like to show how a simple combination of established, maintained, open source technologies can make a lakehouse that truly works for a 150M users company.
<h3>Empowering self-serve data analytics with a text-to-SQL assistant at LinkedIn</h3>
Text-to-SQL is a popular application for large language models (LLMs), tasked with generating SQL queries that answer natural language questions. While creating a proof of concept can be straightforward, delivering a robust text-to-SQL solution that operates at enterprise scale with high user adoption presents significant challenges. In this session, we discuss how we developed a text-to-SQL solution, primarily using Trino, for internal data analytics at LinkedIn. We’ll cover the specific technical hurdles we encountered, such as optimizing query accuracy and performance. Additionally, we’ll share strategies for driving user adoption among diverse personas, from data engineers and data scientists to business analysts and non-technical users. Key takeaways for attendees include: - Best practices for implementing a text-to-SQL solution at scale - Techniques for improving query accuracy and performance - User adoption strategies/modes tailored for different organizational roles - Unleashing potential through Agents This session is ideal for data engineers, data scientists, product managers, and anyone involved in data analytics and user experience design.
<h3>How Trino & dbt unleashed many-to-many interoperability at Bazaar</h3>
Learn how Bazaar leveraged the combined power of Trino and dbt to scale their data platform effectively. This talk delves into the strategies and technologies used to enable many-to-many integration, fueling data-driven decision-making across the organization.
<h3>Maximizing cost efficiency in data analytics with Trino and Iceberg</h3>
At Branch, we realized that our existing architecture, was not only expensive but also becoming unsustainable as data volumes grew for one of our business units and we decided to adopt Trino and Apache Iceberg. Our journey of migrating from Apache Druid to Trino and Iceberg taught us that the right combination of tools can transform data analytics for one of our internal business units, offering the perfect balance between cost savings, performance, and scalability. Learn more how we achieved 7-figure savings with a few “compromises”.
<h3>Lessons and news from the AI world for Trino</h3>
The hype and reality of AI has swept through the industry. Join us for an insightful panel discussion as we explore the powerful intersection of AI and Trino. Hear from our expert panelists as they share their extensive experiences and different perspectives.
<h3>Trino for Observability at Intuit</h3>
At Intuit, the Observability Team has initiated a project to implement a Secondary storage system with a robust query engine, designed to operate alongside our existing Splunk infrastructure. Our data ingestion into this new system, Iceberg, clocks in at over 500+ GB daily via a Spark Data Pipeline formatted specifically for Iceberg. Given that Splunk facilitates scheduled and ad-hoc searches as well as real-time dashboards, our anticipated query load is projected at about 100 queries per second. Consequently, we pivoted to exploring open-source solutions that could be managed on Kubernetes using Argo, offering us complete control over the system. In this tech summit session, we will discuss why Trino emerged as the best-suited Query Engine for managing Aggregate, Percentile, and Regex-based queries, following comparative evaluations with Athena and Starrocks. Moreover, we will detail how we exposed Trino to internal stakeholders. We will finally share the balance sought between query times and cost as we started the approach of making Trino Queries faster by optimizing the underlying Kubernetes infrastructure.
<h3>Hassle-free dynamic policy enforcement in Trino</h3>
The last few years have witnessed an explosion of data protection regulations (e.g. GDPR, DMA, CCPA) in conjunction with an ever-growing appetite for data usage in large businesses. These conflicting trends have presented significant challenges for businesses to maintain compliance in a fast-changing regulatory landscape. It's crucial for large organizations to ensure data compliance is handled efficiently and on time, as delays can result in costly engineering efforts and increase the risk of non-compliance. LinkedIn implemented a policy enforcement system that enables granular enforcement of data usage policies across Linkedin’s data lake. These policies encode rules stating which accesses are allowed. Due to requirements of LinkedIn’s data infrastructure, the enforcement of these policies needs to be done in a way that provides expressibility, cross-engine portability, versioning, and rollout agility. These views are then accessed through engines like Trino. In this talk, we will talk about our journey of supporting policy enforcement for Trino use cases, along with our approach to make them compliant in a hassle-free way.
<h3>Empowering HugoBank's digital services through Trino</h3>
I will be going over how my team and I used the power of trino to digitize and create our delta lake for secure, efficient, and optimized access to allow our digital bank to deliver a seamless digital experience enhanced by data-driven insights. The talk will also include Trino's ability to connect to DBT and Apache Ranger to provide Data governance and simplified data modeling.
<h3>Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and security</h3>
This session will cover Cardo AI's deployment of Trino on AWS EKS, focusing on Helm Chart development to enable a cost-effective, resilient, and secure setup. Key enhancements to the Trino Helm Chart include NetworkPolicy support for controlled access, options for graceful worker shutdown, and templated configurations for greater flexibility. Attendees will gain insights into how Cardo AI balances performance, cost efficiency with spot instances, and secure access through TLS-terminated Load Balancers, creating an optimized multi-tenant environment for real-time analytics on EKS.
<h3>Virtual view hierarchies with Trino</h3>
Trino custom views are incredibly powerful but are often limited to edge cases like computing virtual columns or enforcing permissions. This talk presents the benefits of using a virtual view hierarchy, where views are layered on top of other views, as the primary way to present Trino datasources to user applications. This pattern is especially relevant for Trino users using multiple connectors or considering Iceberg for cold storage.
<h3>Opening up the Trino Gateway</h3>
Trino Gateway has become a very active subproject in the Trino community. Join the maintainers of the project for an update about new features and improvements. They will also share usage scenarios and known users of Trino Gateway in production and provide a glimpse at planned next steps.
<h3>Wvlet: A new flow-style query language for functional data modeling and interactive analysis</h3>
Wvlet, pronounced as weave-let, is a new flow-style query language for SQL-based database engines such as Trino, DuckDB, and Hive. Wvlet queries, saved as .wv files, offer an intuitive way to describe data processing pipelines compiled into a series of SQL queries. This session will demonstrate how the flow-style query syntax enhances existing SQL engines and facilitates functional data modeling. This approach enables reusable and composable methods for constructing complex data pipelines. Wvlet is open-source and freely available at https://wvlet.org/.
<h3>Securing data pipelines at the storage layer</h3>
AI/ML data pipelines consume data from file systems and object stores. Training data is the “weak link” in the AI/ML pipeline Each stage has vulnerabilities that impact integrity, traceability, resilience, and security “30% of enterprises using AI reported having had a security or privacy breach against their AI environment.” Gartner Learn how to protect your data lake including SQL anomaly’s and the storage layer with a cyber storage solution from Superna
<h3>Empowering Pharmaceutical Drug Launches with Trino-Powered Sales Data Analytics</h3>
In the pharmaceutical industry, 63% of drugs fall short of launch expectations, making data analytics essential for success. This session explores how Trino’s real-time and historical data analysis capabilities empower pharmaceutical firms to enhance decision-making, from anticipating market demand to optimizing marketing strategies. With Trino’s predictive modeling improving demand forecasting accuracy by 20%, companies can better allocate resources, adapt to market shifts, and boost prescription rates by 15-30%, ensuring successful drug launches.
<h3>Connecting to Trino with C# and ADO.net</h3>
In this session George talks about the new Trino subproject trino-csharp-client. It was recently added to the Trino community and includes support for connecting to Trino in a client application written in C#. In addition, an ADO.NET driver can be used to access Trino from many other programming languages and applications.
Speakers
Abdullah Alkhawatrah
Software Engineer at Wise
Alagappan Maruthappan
Software Engineer at Netflix
Albert Chen
Machine Learning Engineer at LinkedIn
Andrew MacKay
CTO & CSO at Superna
Dain Sundstrom
Co-Creator of Trino and Chief Technology Officer at Starburst
Gaurav Ahlawat
Senior Software Engineer, Data Science at LinkedIn
George Fisher
Senior Software Engineer at Microsoft
Gopi Bhagavathula
Gopi Bhagavathula, Staff Engineer at Branch
Gunther Hagleitner
CEO & Co-founder @ Waii
Harpreet Singh
Director of Sales Analytics & Operations at Gilead Sciences
Jaeho Yoo
Analytics Engineering at Naver
Jan Waś
Software engineer at Starburst Data
Manas Bundele
Sr. Software Engineer, Machine Learning at LinkedIn
Manfred Moser
Director, Open Source Engineering
Martin Traverso
Co-Creator of Trino and Chief Technology Officer, Starburst
Mustafa Mirza
Lead Platform Data Engineer at HugoBank
Mustafa Sakalsiz
Peaka, Founder and CEO
Peter Kosztolanyi
Staff Data Engineer at Wise
Pratham Desai
Software Engineer at LinkedIn
Ramanathan Ramu
Senior Software Engineer at LinkedIn
Razi Moosa
Data Analyst at HugoBank
Riya John
Senior Software Engineer at Intuit
Rob Dickinson
VP of Engineering at Graylog
Rong Rong
Software Engineer at CharacterAI
Sebastian Daberdaku
Data Engineering Tech Lead at CardoAI
Shahzad Siddiqi
Engineering Manager (Data Platform & ML) at Bazaar Technologies
Siddique Ahmad
Data Engineer at Bazaar Technologies
Taro L. Saito
Senior Principal Engineer at Treasure Data
Trevor Kennedy
Data Architect at FanDuel
Ujjwal Sharma
Software Engineer 2 at Intuit
Usman Ghani
Data Engineer at Bazaar Technologies
Vincenzo Cassaro
Data Engineer at Prezi
Vishal Jadhav
Software Engineer at Bloomberg LP
Will Morisson Director, Technical Customer Success at Starburst
William Chang
Co-founder & CTO of Canner