Trino Summit 2023 Hosted ByTrino Summit 2023 logo

December 13 - 14, 2023

Virtual

Follow us:

Trino Summit 2023 twitteTrino Summit 2023 github
Trino Summit 2023 header background

About the event

Trino Summit, this December 2023, is a virtual event that brings together engineers, data analysts, and anyone else who may consider themself to be a Trino contributor or user. As the biggest Trino event of the year, we're excited to bring together professionals from all over big data and analytics so the community can share experiences, insights, make connections, and learn from each other.

Agenda

Enduring with persistence to reach the summit

In the keynote Martin presents the latest and greatest news from the Trino project and the Trino community. With more contributors, more maintainers, and a larger community we got a lot done since Trino Fest in June. Find out the details from the co-creator of Trino.

Speakers:

Martin Traverso

Running Trino at exabyte-scale data warehouse

Netflix operates over 15+ Trino clusters, efficiently handling more than 10 million queries each month. As the initial creator of the Apache Iceberg, Netflix has over 1 million Iceberg tables extensively using Trino-Iceberg connector. In this session we will talk about the operational challenges faced, internal efficiency improvements and experience with upgrading to the latest Trino version.

Speakers:

Alagappan Maruthappan

Data Lake at Wise powered by Trino and Iceberg

Wise Plc. (https://wise.com) is on a journey to build a modern data lake using an open-source data stack. Known for its global money transfer services, Wise is in the process of migrating over 50 trillion records, amounting to more than 3 petabytes of raw data, collected from hundreds of services and near-realtime streams into this new open data lake architecture. The new infrastructure supports the ingestion of terabytes of new data and handles millions of queries daily. At the heart of this architecture are Trino and Iceberg, with Trino playing a key role in optimizing Iceberg tables, powering data transformations through DBT, and delivering fast, scalable analytics for ad-hoc querying and dashboarding. Trino's ability to process large-scale data with distributed querying makes it not only performant but also cost-effective, allowing Wise to scale its data operations efficiently without the heavy overhead costs typically associated with traditional data warehouses.

Speakers:

Peter Kosztolanyi
Abdullah Alkhawatrah

Using Trino as a Strangler Fig

This talk will discuss how FanDuel uses Trino to migrate analysts from Redshift to Delta Lake using Martin Fowler's Strangler Fig pattern. Trino slowly took roots after initial trails, starts replacing parts of the legacy system, and eventually will be a complete replacement with a shadow of the original system.

Speakers:

Trevor Kennedy

A Lakehouse that simply works

With the billions of tech and vendors proposal, it's easy to loose track of what truly matters. I would like to show how a simple combination of established, maintained, open source technologies can make a lakehouse that truly works for a 150M users company.

Speakers:

Vincenzo Cassaro

Empowering self-serve data analytics with a text-to-SQL assistant at LinkedIn

Text-to-SQL is a popular application for large language models (LLMs), tasked with generating SQL queries that answer natural language questions. While creating a proof of concept can be straightforward, delivering a robust text-to-SQL solution that operates at enterprise scale with high user adoption presents significant challenges. In this session, we discuss how we developed a text-to-SQL solution, primarily using Trino, for internal data analytics at LinkedIn. We’ll cover the specific technical hurdles we encountered, such as optimizing query accuracy and performance. Additionally, we’ll share strategies for driving user adoption among diverse personas, from data engineers and data scientists to business analysts and non-technical users. Key takeaways for attendees include: - Best practices for implementing a text-to-SQL solution at scale - Techniques for improving query accuracy and performance - User adoption strategies/modes tailored for different organizational roles - Unleashing potential through Agents This session is ideal for data engineers, data scientists, product managers, and anyone involved in data analytics and user experience design.

Speakers:

Gaurav Ahlawat
Albert Chen
Manas Bundele

How Trino & dbt unleashed many-to-many interoperability at Bazaar

Learn how Bazaar leveraged the combined power of Trino and dbt to scale their data platform effectively. This talk delves into the strategies and technologies used to enable many-to-many integration, fueling data-driven decision-making across the organization.

Speakers:

Shahzad Siddiqi
Siddique Ahmad
Usman Ghani

Maximizing cost efficiency in data analytics with Trino and Iceberg

At Branch, we realized that our existing architecture, was not only expensive but also becoming unsustainable as data volumes grew for one of our business units and we decided to adopt Trino and Apache Iceberg. Our journey of migrating from Apache Druid to Trino and Iceberg taught us that the right combination of tools can transform data analytics for one of our internal business units, offering the perfect balance between cost savings, performance, and scalability. Learn more how we achieved 7-figure savings with a few “compromises”.

Speakers:

Gopi Bhagavathula

Lessons and news from the AI world for Trino

The hype and reality of AI has swept through the industry. Join us for an insightful panel discussion as we explore the powerful intersection of AI and Trino. Hear from our expert panelists as they share their extensive experiences and different perspectives.

Speakers:

Rong Rong
William Chang
Mustafa Sakalsiz
Gunther Hagleitner
Dain Sundstrom
Manfred Moser

Trino for Observability at Intuit

At Intuit, the Observability Team has initiated a project to implement a Secondary storage system with a robust query engine, designed to operate alongside our existing Splunk infrastructure. Our data ingestion into this new system, Iceberg, clocks in at over 500+ GB daily via a Spark Data Pipeline formatted specifically for Iceberg. Given that Splunk facilitates scheduled and ad-hoc searches as well as real-time dashboards, our anticipated query load is projected at about 100 queries per second. Consequently, we pivoted to exploring open-source solutions that could be managed on Kubernetes using Argo, offering us complete control over the system. In this tech summit session, we will discuss why Trino emerged as the best-suited Query Engine for managing Aggregate, Percentile, and Regex-based queries, following comparative evaluations with Athena and Starrocks. Moreover, we will detail how we exposed Trino to internal stakeholders. We will finally share the balance sought between query times and cost as we started the approach of making Trino Queries faster by optimizing the underlying Kubernetes infrastructure.

Speakers:

Ujjwal Sharma
Riya John

Hassle-free dynamic policy enforcement in Trino

The last few years have witnessed an explosion of data protection regulations (e.g. GDPR, DMA, CCPA) in conjunction with an ever-growing appetite for data usage in large businesses. These conflicting trends have presented significant challenges for businesses to maintain compliance in a fast-changing regulatory landscape. It's crucial for large organizations to ensure data compliance is handled efficiently and on time, as delays can result in costly engineering efforts and increase the risk of non-compliance. LinkedIn implemented a policy enforcement system that enables granular enforcement of data usage policies across Linkedin’s data lake. These policies encode rules stating which accesses are allowed. Due to requirements of LinkedIn’s data infrastructure, the enforcement of these policies needs to be done in a way that provides expressibility, cross-engine portability, versioning, and rollout agility. These views are then accessed through engines like Trino. In this talk, we will talk about our journey of supporting policy enforcement for Trino use cases, along with our approach to make them compliant in a hassle-free way.

Speakers:

Ramanathan Ramu
Pratham Desai

Empowering HugoBank's digital services through Trino

I will be going over how my team and I used the power of trino to digitize and create our delta lake for secure, efficient, and optimized access to allow our digital bank to deliver a seamless digital experience enhanced by data-driven insights. The talk will also include Trino's ability to connect to DBT and Apache Ranger to provide Data governance and simplified data modeling.

Speakers:

Mustafa Mirza
Razi Moosa

Optimizing Trino on Kubernetes: Helm chart enhancements for resilience and security

This session will cover Cardo AI's deployment of Trino on AWS EKS, focusing on Helm Chart development to enable a cost-effective, resilient, and secure setup. Key enhancements to the Trino Helm Chart include NetworkPolicy support for controlled access, options for graceful worker shutdown, and templated configurations for greater flexibility. Attendees will gain insights into how Cardo AI balances performance, cost efficiency with spot instances, and secure access through TLS-terminated Load Balancers, creating an optimized multi-tenant environment for real-time analytics on EKS.

Speakers:

Sebastian Daberdaku
Jan Waś

Virtual view hierarchies with Trino

Trino custom views are incredibly powerful but are often limited to edge cases like computing virtual columns or enforcing permissions. This talk presents the benefits of using a virtual view hierarchy, where views are layered on top of other views, as the primary way to present Trino datasources to user applications. This pattern is especially relevant for Trino users using multiple connectors or considering Iceberg for cold storage.

Speakers:

Rob Dickinson

Opening up the Trino Gateway

Trino Gateway has become a very active subproject in the Trino community. Join the maintainers of the project for an update about new features and improvements. They will also share usage scenarios and known users of Trino Gateway in production and provide a glimpse at planned next steps.

Speakers:

Manfred Moser
Will Morisson Director, Technical Customer Success at Starburst
Jaeho Yoo
Vishal Jadhav

Wvlet: A new flow-style query language for functional data modeling and interactive analysis

Wvlet, pronounced as weave-let, is a new flow-style query language for SQL-based database engines such as Trino, DuckDB, and Hive. Wvlet queries, saved as .wv files, offer an intuitive way to describe data processing pipelines compiled into a series of SQL queries. This session will demonstrate how the flow-style query syntax enhances existing SQL engines and facilitates functional data modeling. This approach enables reusable and composable methods for constructing complex data pipelines. Wvlet is open-source and freely available at https://wvlet.org/.

Speakers:

Taro L. Saito

Securing data pipelines at the storage layer

AI/ML data pipelines consume data from file systems and object stores. Training data is the “weak link” in the AI/ML pipeline Each stage has vulnerabilities that impact integrity, traceability, resilience, and security “30% of enterprises using AI reported having had a security or privacy breach against their AI environment.” Gartner Learn how to protect your data lake including SQL anomaly’s and the storage layer with a cyber storage solution from Superna

Speakers:

Andrew MacKay

Empowering Pharmaceutical Drug Launches with Trino-Powered Sales Data Analytics

In the pharmaceutical industry, 63% of drugs fall short of launch expectations, making data analytics essential for success. This session explores how Trino’s real-time and historical data analysis capabilities empower pharmaceutical firms to enhance decision-making, from anticipating market demand to optimizing marketing strategies. With Trino’s predictive modeling improving demand forecasting accuracy by 20%, companies can better allocate resources, adapt to market shifts, and boost prescription rates by 15-30%, ensuring successful drug launches.

Speakers:

Harpreet Singh

Connecting to Trino with C# and ADO.net

In this session George talks about the new Trino subproject trino-csharp-client. It was recently added to the Trino community and includes support for connecting to Trino in a client application written in C#. In addition, an ADO.NET driver can be used to access Trino from many other programming languages and applications.

Speakers:

George Fisher

Speakers

Abdullah Alkhawatrah

Abdullah Alkhawatrah

Software Engineer at Wise

Alagappan Maruthappan

Alagappan Maruthappan

Software Engineer at Netflix

Albert Chen

Albert Chen

Machine Learning Engineer at LinkedIn

Andrew MacKay

Andrew MacKay

CTO & CSO at Superna

Dain Sundstrom

Dain Sundstrom

Co-Creator of Trino and Chief Technology Officer at Starburst

Gaurav Ahlawat

Gaurav Ahlawat

Senior Software Engineer, Data Science at LinkedIn

George Fisher

George Fisher

Senior Software Engineer at Microsoft

Gopi Bhagavathula

Gopi Bhagavathula

Gopi Bhagavathula, Staff Engineer at Branch

Gunther Hagleitner

Gunther Hagleitner

CEO & Co-founder @ Waii

Harpreet Singh

Harpreet Singh

Director of Sales Analytics & Operations at Gilead Sciences

Jaeho Yoo

Jaeho Yoo

Analytics Engineering at Naver

Jan Waś

Jan Waś

Software engineer at Starburst Data

Manas Bundele

Manas Bundele

Sr. Software Engineer, Machine Learning at LinkedIn

Manfred Moser

Manfred Moser

Director, Open Source Engineering

Martin Traverso

Martin Traverso

Co-Creator of Trino and Chief Technology Officer, Starburst

Mustafa Mirza

Mustafa Mirza

Lead Platform Data Engineer at HugoBank

Mustafa Sakalsiz

Mustafa Sakalsiz

Peaka, Founder and CEO

Peter Kosztolanyi

Peter Kosztolanyi

Staff Data Engineer at Wise

Pratham Desai

Pratham Desai

Software Engineer at LinkedIn

Ramanathan Ramu

Ramanathan Ramu

Senior Software Engineer at LinkedIn

Razi Moosa

Razi Moosa

Data Analyst at HugoBank

Riya John

Riya John

Senior Software Engineer at Intuit

Rob Dickinson

Rob Dickinson

VP of Engineering at Graylog

Rong Rong

Rong Rong

Software Engineer at CharacterAI

Sebastian Daberdaku

Sebastian Daberdaku

Data Engineering Tech Lead at CardoAI

Shahzad Siddiqi

Shahzad Siddiqi

Engineering Manager (Data Platform & ML) at Bazaar Technologies

Siddique Ahmad

Siddique Ahmad

Data Engineer at Bazaar Technologies

Taro L. Saito

Taro L. Saito

Senior Principal Engineer at Treasure Data

Trevor Kennedy

Trevor Kennedy

Data Architect at FanDuel

Ujjwal Sharma

Ujjwal Sharma

Software Engineer 2 at Intuit

Usman Ghani

Usman Ghani

Data Engineer at Bazaar Technologies

Vincenzo Cassaro

Vincenzo Cassaro

Data Engineer at Prezi

Vishal Jadhav

Vishal Jadhav

Software Engineer at Bloomberg LP

Will Morisson Director, Technical Customer Success at Starburst

Will Morisson Director, Technical Customer Success at Starburst

William Chang

William Chang

Co-founder & CTO of Canner

Learn more in the Trino Summit 2023 blog recap

Read now

Thank you to our sponsors

Alluxio LogoMonte Carlo LogoCoginiti Logo
Trino Summit Mountain