How to see I/O wait time

If you are wondering if Trino has a metric that shows how much CPU time is wasted waiting on I/O operations such as retrieving data from your lake, the short answer is “no, it doesn’t”. Fortunately, there are a couple of heuristics that can help you see if something is I/O heavy (or efficient). Let’s look at the following EXPLAIN ANALYZE output from an initial SOURCE stage.

The first rule of thumb is to compare CPU time with the Scheduled time. When Scheduled is dramatically greater than CPU, it could mean there is significant I/O wait time.

The example above is actually not that far off. With a bunch of small input files, for example, you could see that difference be 10x-100x (or more) greater.

By comparison, the next stage reported values that were much closer to each other as it wasn’t a SOURCE stage which are usually our biggest I/O focus areas.

The second heuristic that can be considered is only present on the SOURCE stages. The Physical input time from the original stage presented indicates the amount of time spent actually retrieving (and maybe converting/parsing) data from the source. NOTE: Not all connectors implement this.

So… as stated above, if Scheduled time is high relative to CPU time it suggests the query was I/O heavy. If Scheduled time is high compared to even Physical input time then either the Physical input time is not being calculated correctly by connector or there’s some other work which makes splits run longer than just reading data.

Starburst UI gives you another metric called ACTIVE (seen below) which can be found in Query overview > Advanced > Execution details.

The tooltip reports defines the metric and it is aggregated for the whole query.

Screenshot 2025-01-08 at 4.47.42 PM

Since this query only had 2 stages and the majority of the work was in the initial SOURCE stage, the 21% shown above is very similar to the percentage of CPU / Schedule time initial discussed. 496.15 / 2420 = 20.5%.

So, while NOT an exact science (and easier to focus just on the SOURCE stage instead of the entire query plan) this example above suggests that 80% of the resources were idly waiting on I/O time.

1 Like