Does anyone know how to read (without external cleaning) a json like this?

luis.machado · December 3, 2021, 5:27pm

[
   {
      "id_source":"data",
      "title":"data",
      "action":"data",
      "dimension":"data",
      "date":"data",
      "user":{
         "phone":"data"
      }
   },
   {
      "id_source":"data",
      "title":"data",
      "action":"data",
      "dimension":"data",
      "date":"data",
      "user":{
         "phone":"data"
      }
   }
]

bitsondatadev · December 4, 2021, 3:43am

Hey @luis.machado,
You can do a lot of processing using SQL. See this as an example:

SELECT parsed[1].user.phone FROM
(SELECT CAST (json_parse(json) AS ARRAY(ROW(idSource VARCHAR, title VARCHAR, action VARCHAR, dimension VARCHAR, date VARCHAR, user ROW(phone VARCHAR)))) AS parsed
    FROM (VALUES '[' ||
    '{"id_source":"data","title":"data","action":"data","dimension":"data","date":"data","user":{"phone":"data1"}},' ||
    '{"id_source":"data","title":"data","action":"data","dimension":"data","date":"data","user":{"phone":"data2"}}' ||
    ']'
    ) t(json)
);

This will return “data1”.
While this is good for prototyping, I would always recommend external cleaning. This type of processing at scale is going to be slooooooooooowwwww.

Also if you’re asking about directly reading from a json file, that’s not possible. You would need to have Trino parse the entire json file and bypass the serde, which is not scalable. i.e. Doing so is not a permanent solution or one that I would deploy in any production environment.

But it’s good when exploring data like this to play around with the data.