Query Engine
A query engine in action: A user wants to scan data from a table in some database and filter out the results per their requirements. They tell the query engine what they want in simple, semantic terms. It then builds a plan to perform the scan, determines where and how to filter the results, and devises the order of operations to execute the query.
Similar to a Google search, a query engine turns a complicated process of information retrieval into a simple request. However, unlike Google, which pulls information from a paired down index, query engines go down to the databases themselves. That makes query engines vital tools for running analytics and uncovering insights.
Query Engines and Business Impact
More than just a data tool or yet another IT initiative, query engines are one of today’s most important business tools. The technical specifications — how the query engine understands queries and interacts with data — are important to understand, but the business impact is ultimately what matters. That depends on three features:
- User-friendliness: How simple the query engine is to interact with determines how often users rely on the tool and how extensively they explore the data. Intuitive tools encourage people to dig deeper while confusing tools only inhibit data usage.
- Retrieval size: How much data the query engine can move through the pipeline (and how quickly) determines how much information analytics can incorporate. Query engines that can handle large data volumes (petabytes) bring more insights to light with less time and effort.
- Source variety: How many different sources the query engine can pull data from at once or overall determines how long it takes to get a complete picture from the data. Querying multiple sources at once not only saves time but also ensures nothing goes overlooked.
Query Engine Features
As the primary connection between end users and the data their decision-making processes require, the choice of a query engine matters. There are two critical features for modern query engines: a distributed architecture and SQL.
What is a distributed query engine?
A distributed query engine uses a distributed architecture similar to massive parallel processing (MPP) databases. Trino is an example of a distributed query engine.
A coordinator/orchestrator then develops a plan to fulfill the query, and distributed “workers” actually execute it. This approach splits data across multiple nodes and leverages parallel data ingestion and processing rather than linear or chronological processing. The result is faster and more efficient query execution.
What is a SQL query engine?
Most traditional databases and data warehouses utilize SQL query engines that follow a SQL standard for the query language. Not all databases follow the same type, though. Some SQL is “looser” than others, resulting in unpredictable variations that make it harder to correctly query the data source or get back the desired data.
A SQL query engine like Trino acts as a translation layer, translating SQL queries into an execution plan and distributing this plan among workers. A metadata store is used to hold tables, functions, etc. which the coordinator will use for interpreting the SQL.
What are the Benefits of a Query Engine?
Query engines bring a big data strategy to life by serving as the final piece of the data pipeline and facilitating access to abundant data across sources.
Query engines enable teams to:
Streamline data management
A good query engine reduces the need to carefully organize, secure, and architect data, which becomes especially important as data volume and velocity both grow. A good query engine also allows you to query the data where it lives, reducing the amount of data pipelines and improving business agility.
Liberate the data team
Instead of asking someone else to provide data, users can run their own queries, including ad-hoc requests to explore whenever they may want. Meanwhile, the data team has more time and resources to focus on improving data engineering or building data products.
Improve decision-making
Query engines improve decision-making by connecting people with more data in less time and without the need for technical expertise. They further improve decision-making by translating data queries into a context that’s familiar and relevant for business purposes.
What are the Challenges of a Query Engine?
The right query engine is a solution to many problems facing data creators and data consumers alike. Therefore, the challenge is trying to orchestrate a data strategy without the aid of a query engine. And when that becomes untenable, the challenge shifts to evaluating, selecting, and implementing the right query engine.
Any query engine will be better than nothing. That said, query engines that limit access to data sources, duplicate or complicate data, fail at federated queries, or increase time to insight simply replace old challenges with new ones.
Why Trino?
Trino, an open-source sql query engine first developed at Facebook and formerly PrestoSQL, solves a common problem with data retrieval: having to go to the storage layer to query a database where it’s stored. Accessing data this way means the query engine has to move large data volumes through the pipeline, which is slow and error-prone.
Trino provides an elegant solution by removing the storage layer from the equation. The query engine sits on top of other query engines, intelligently distributing requests between them before integrating the results to reflect the original query. This query federation provides unlocks access to new mission critical insights for businesses.
Starburst is Enterprise-grade Trino
Trino has the fastest speed in its class plus a community built around it to foster continual improvement — and Starburst is the commercial distribution of Trino. See how Starburst can simplify your information pipeline by scheduling a demo today.