Does anyone know if fault-tolerant execution affects Datalake based files (Parquet/Delta/ORC) for larger queries, or is it just for Database connections? (Big Query etc).
FTE is supported for data lake/object storage connectors and RBDMS connectors alike.
In general small queries will be slower compared to standard mode (our latest test on TPC-DS sf1000 schema is around 26% slower). However for larger/write queries, FTE is generally more performant due to more efficient use of memory and more flexible writer scaling.
For long running queries on clusters that don’t have enough memory, FTE helps these queries complete. Yes, that’s one of the main benefit of fault-tolerant execution, that users don’t need to worry about correctly sizing their cluster such that their queries won’t run out of memory. For Starburst Enterprise, it can handle intermediate data size up to 50 * max-query-per-node, say max-query-per-node is 220GB, then it can handle up to 11TB. For Starburst Galaxy, it can handle up to 60TB.
Check this page out: Starburst | Enable fault-tolerant execution for queries in SEP