How do I search nested objects in Elasticsearch from Trino?

This question was originally posted on Stack Overflow.

I’m new to trino and I’m trying to use it to query nested objects in elastic search.

This is my mapping in elasticsearch:

{
"product_index": {
   "mappings": {
      "properties" :{
          "id" : { "type" : "keyword"},
          "name" { "type" : "keyword"},
          "linked_products" :{
               "type": "nested",
               "properties" :{
                   "id" : { "type" : "keyword"}
                }
           }
        }
      }
    }
}

I need to perform a query on the id field under linked_products . what is the syntax in trino to perform a query on the id field? Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?

Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.

Assume we push the following document to your index.

curl -X POST "localhost:9200/product_index/_doc?pretty" 
-H 'Content-Type: application/json' -d'
{
  "id": "1",
  "name": "foo",
  "linked_products": {
      "id": "123"
  }
}
'

The way you would read this out in Trino would just be to use the standard ROW syntax.

SELECT 
  id, 
  name, 
  linked_products.id 
FROM elasticsearch.default.product_index;

Result:

|id |name|id |
|---|----|---|
|1  |foo |123|

This is fine and well, but judging from the fact that the name of your nested object is plural, I’ll assume you want to store an array of objects like so.

curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "id": "2",
  "name": "bar",
  "linked_products": [
    {
      "id": "123"
    },
    {
      "id": "456"
    }
  ]
}
'

If you run the same query as above, with the second document inserted, you’ll get the following error.

SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]

This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you’ll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.

curl --request PUT \
    --url localhost:9200/product_index/_mapping \
    --header 'content-type: application/json' \
    --data '
{
    "_meta": {
        "presto":{
            "linked_products":{
                "isArray":true
            }
        }
    }
}'

Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.

SELECT 
  id, 
  name,
  element_at(linked_products, 1).id AS id1, 
  element_at(linked_products, 2).id AS id2 
FROM elasticsearch.default.product_index;

Result:

|id |name|id1|id2 |
|---|----|---|----|
|1  |foo |123|NULL|
|2  |bar |123|456 |