Forcing Push Down Join into connected Data source

johnz · June 2, 2022, 4:09pm

The Trino Doc claims that Joins can be pushed down into a connected DB, such as postgres

I have typical OLAP query that does a join, group by on a dimension, and a aggregation of a metric all on the same connected DB. The query returns ~100 rows.

But Trino is breaking up the query and doing the filtering on the connected DB, but then does the join and aggregation in the Trino Server, moving ~100M rows across the network.

Is there a way to “quote” a sub query, to tell the Trino planner just pushdown the query and don’t bother making a sub plan?

I have a star schema, with one huge table (>100M rows) and a dimension table that has static attributes of the huge table.
The dimension table is filtered to create a map, that is joined to the huge table. The result is group by on a dimension and finally some of the metrics from the huge table are aggregated to calculate stats.

bitsondatadev · June 6, 2022, 10:17pm

Weclome @johnz ,

Interesting timing for your question. We’ve recently introduced Polymorphic Table Functions to Trino in version 381.

In version 384, which was just released a few days ago, the query table function was added in PR 12325 to address the very problem you’ve mentioned above that is described in issue 9163.

For a quick example in MySQL:

trino> USE mysql.tiny;
USE mysql.tiny

trino:tiny> SELECT * FROM TABLE(system.query(query => 'SELECT 1 a'));
 a
---
 1
(1 row)

trino:tiny> SELECT * FROM TABLE(system.query(query => 'SELECT @@version'));
 @@version
-----------
 8.0.29
(1 row)

So this will run exactly the command on the underlying database (not exactly a pushdown but a pass-through) and return the results to Trino as a Table. SELECT @@version is MySQL specific syntax that returns MySQL output as a table that now Trino is able to further process.

Just know that this has the trade-off of you managing the underlying syntax yourself.

johnz · June 9, 2022, 8:49pm

Wow, This is exactly what I was looking for! Thanks

I installed version 384, and tried the simple query through Dbeaver and SqlAlcemy and get an error about the system.query function not being registered:

SELECT * FROM TABLE(system.query(query => ‘SELECT 1 a’))

TrinoUserError(type=USER_ERROR, name=FUNCTION_NOT_FOUND, message=“line 1:21: Table function system.query not registered”, query_id=20220609_203804_00037_qnqjw)

How do I register the function?

says:
“The function is always registered, but it is disabled by default by access control.”

bitsondatadev · June 13, 2022, 5:47pm

@johnz,

Did you happen to set the catalog and schema in the session? This is still very new syntax (so new that we haven’t set up the documentation yet) but from what I’ve deduced, is that the query function is registered to different catalogs, and you need to specify the catalog in order for it to know which function to use. Notice I set the session property in my example to set catalog/schema.

trino> USE mysql.tiny;
USE mysql.tiny

johnz · June 14, 2022, 4:22am

Sorry, I missed the command SET catalog.database
I am using JDBC clients and not the command line.

For DBeaver, this simple sequence of commands worked !!!

USE opalpostgres.public;
SELECT * FROM TABLE(system.query(query => 'SELECT 1 a'));

For a complicated query, string constants inside the query need to have two single quotes:

pdate >= ''2022-04-22''

For more complicated queries with joins, group by and aggregation, I get the error on an aggregation column

avg(hits) avg_hits,

java.lang.UnsupportedOperationException: Unsupported type: JdbcTypeHandle{jdbcType=2, jdbcTypeName=numeric, columnSize=Optional[0], decimalDigits=Optional[0]} of column: avg_hits

johnz · June 14, 2022, 11:45am

Could you give an example of an encapsulating query that joins between two quoted queries?
How can the catalog.datebase context be set in two different parts of the encapsulated query?
Is there a context parameter?

masayyed · September 8, 2022, 7:32am

Hi team,

We have integrated apache ranger with trino for user authorization. Now we are facing access denied on trino-table-function.

how can we define ranger policy to allow users to use Trino-table-function?

Regards,
Mahebub Sayyed

johnz · December 6, 2022, 5:04pm

Is there a pointer to documentation for Trino query encapsulation?
Starburst has a blog page on " Introducing Full Query Passthrough For Faster Query Federation"