Mass data load using Trino JDBC driver

sharad.kumar · November 23, 2022, 7:18pm

Hi,

I’m reading data from Trino using a JDBC connector in a Java application. Is is possible to read the Trino table data en mass (into a dataframe like data structure) using Trino JDBC driver? Or am I limited by the semantics of reading one row at a time from the ResultSet?

Thanks,
Sharad

lester · July 14, 2023, 3:53pm

Correct, using the JDBC driver from your Java app means you need to parse through the ResultSet like you would normally do with any other data platform you are reading from via JDBC.

Nothing is preventing you from running multiple queries via multiple JDBC connections (i.e. split up the requests by using non-overlapping WHERE clauses), but you’d surely have to handle that you have N independent ResultSets that you’ll have to process independently (likely in separate threads).

simpligility · July 19, 2023, 3:56am

From your Java application you can treat the result set like a dataframe. The JDBC driver abstracts the interactions with the REST API of Trino and loads more data as needed. If you want you can go through the whole result set and built that internal memory structure in your Java app, provided you have enough memory and then work with it.

This is a general aspect shared by all client tools for Trino and Starburst at the moment. For example the Python client and any tool built on top of it has the same behavior since it goes via the same API.

If you are loading data into a system with Trino you can always submit data in batches of multiple queries.