So why use a big data SQL query engine? Well, have you suffered from the following problems with processing and analyzing big data via a centralized approach: different query languages, niche systems, and specialists, data distributed across various silos, slow queries. I can go on, but you get it, right? The traditional paradigm is cumbersome for many like-minded users.
You’re in luck today because, my friend, you don’t need to do what has always been done when you possess the right SQL query engine.
What is a big data SQL query engine?
First, let’s start with what a SQL query engine is not. It’s not a search query, where users search for information using a search engine. After the user types the keyword(s) and presses “enter,” the keywords are sent to a search engine and processed with an algorithm that retrieves results from a search index.
On the other hand, a SQL query engine is a way to query data against various data sources and vast amounts of it, terabytes and petabytes of data. Users can construct queries to select, insert, update and locate data. It’s up to the system to analyze the query against its own structures to retrieve the information. In fact, SQL was designed specifically for this specific purpose.
Common big data SQL query engine tools
Commonly used SQL query engine tools include: Spark, Impala, Hive, and Trino. Every technological solution has a history, as well as an origin story. Mine is with Trino.
As a contributor to Trino, formerly known as PrestoSQL, we created it as a faster alternative to HIVE and to quickly query multiple data sources.
Trino is an open-source, distributed SQL query engine and even though you can use Trino to run SQL queries against data, it doesn’t mean Trino is a database. It uses cluster computing, enabling users to query data from various data sources where it resides, whether it’s on-prem or in the cloud. And because you’re on the Starburst blog, I suspect you might have figured out that Starburst is the commercial distribution of Trino, enabling more connectivity (i.e. HDFS, Amazon S3, Snowflake, Salesforce, Oracle, Google, Redshift, Splunk, Kafka, Cloudera, IBM, Teradata and more), security and enterprise-level support. A short detour and a little Starburst trivia, “the Starburst name is a reference to a SQL query called “select star,” which is a search command to bring back everything. The burst piece was a reference that we can retrieve anything really fast.”
I mention this piece of Starburst detail as a nod to the beginning of this blog: having the right SQL query engine. There are many, for whatever reason, that take a very circuitous approach by building on their existing architecture with open source-based massively parallel processing engines (i.e. Trino) and/or use an external query engine to query from object storage (data lake). This creates unnecessary data duplication, overly complicated data architecture and increases time-to-insight.
There are certainly merits to every organization’s architecture, but go with the SQL query engine that will simplify and fortify your business especially in an uncertain economic climate and when getting real-time analytics can make or break a business. For me, this SQL query engine is undoubtedly Trino, and you’ll have to find what’s right for you, but do your due diligence.
Use cases for a big data SQL query engine
With so many drivers to monetize data: increase customer retention, introduce a new line of business, enable competitive differentiation, a SQL query engine can empower data analysts and business leaders to act as a data analytics editor. Confidently query for precise data sets to make powerful data-driven decisions. And when you’re in need of a new revenue stream, query for unlikely combinations for interesting results. Below, I highlight a few use cases that might shift your long-held belief that the single source of truth is the only way to go:
Single point of access to your data
Unlike centralization, a SQL query engine can facilitate access to data, wherever it resides. And that’s inherently going to give you access to more data as well as access to information that the business possesses to draw hidden insightful correlations. You don’t have to query from different databases for information. And, experienced power users understand the frustration of not being unable to combine data from different systems. In short, fast access to information is power.
SQL-based access to all your data
For some time, relational database management systems have been rendered insufficient. Today, so much structured, semi-structured and unstructured data reside in various data systems. They require deep expertise and a learning curve to even scratch the surface of your data. With SQL-based access, connections to all of these systems are made possible.
Semantic layer
The right SQL query engine also encompasses a semantic layer that translates complicated systems, definitions and mathematical equations into a common language and accessible interface to business users. The way fashion is described as armor to face the world, there’s an eloquence to a semantic abstraction layer. It maps complex data set into familiar business terminology: revenue, customer, costs — to offer a uniform, coordinated, consolidated view of data across the entire organization.
Faster time-to-insight
Previously, if a query that used to take a few days to run can be executed in less than 30 minutes it frees an analyst or business leader from time to insight as well as the ability to inquire and ask “what if” questions. How this works with a SQL-based massively parallel processing engine is that once a query comes in, that query can get split up — also known as “splits” — to get more done in less time. You can have a large table scan, where there are different splits, reading different sections of the table — in parallel — where we can massively scale in performance.
Support for machine learning and artificial intelligence
With SQL-based access to data, it’ll also present opportunities to discover more data sets for machine learning and artificial intelligence systems. As many machine learning (ML) and artificial intelligence (AI) practitioners know, the algorithms and recommendations made are only as good as the data.
Big data SQL query engine enables the freedom to be curious
As we’ve seen with the use cases above, there are many powerful ways we can enlist a SQL query engine like Trino or Starburst to improve business intelligence, make empowered data-driven decisions, reimagine new product and revenue streams. It’s really only the beginning. Let readily available data enable you the freedom to be curious.
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).