Online Analytical Processing (OLAP)
Early-stage companies try using their online transaction processing (OLTP) systems as analytics platforms, but OLAP systems like data warehouses make analytics more efficient. This guide will help you understand the difference between OLTP and OLAP, why OLAP works better for analytics, and why open data lakehouses are better OLAP solutions.
Online analytical processing vs online transactional systems
Transactional and analytical data processing are so distinct that using one system for both introduces considerable risk. OLTP systems keep the company running. Adding analytical burdens could interfere with their operations.
7 differences between OLTP and OLAP
Exploring the key differences between OLTP and OLAP will help explain why dedicated OLAP systems are better options.
1. Purpose and users
OLTP systems reliably store high volumes of transactional data for fast retrieval and updating. For example, a bank’s financial transactions must be processed quickly, in the right order, with no chance for error — at a rate of thousands of transactions per second. End users accessing OLTP systems need to see data about specific transactions, such as when a salesperson brings up a customer order.
OLAP systems also handle large volumes of data. However, analytics relies on combining multidimensional historical data from multiple sources. For instance, when a dashboard summarizes performance metrics for business users by combining the revenue, location, and time dimensions. With more complex queries, like those used in machine learning projects, the number of dimensions can reach into the thousands.
2. Data architecture
OLTP and OLAP systems rely on different architectures to fulfill their respective purposes. Transactional systems use relational databases. They must deliver fast read and write speeds. And they must guarantee reliability through ACID compliance.
Data in OLAP systems rarely change, so read performance is the priority over write speeds. They store data in multidimensional tables optimized to make queries more efficient.
3. Data sources
Financial, inventory, activity logging, and other transactional systems are separate applications with dedicated relational databases. This focus makes OLTP systems more performant and efficient in real-time use cases.
OLAP systems, like a data warehouse, source data from systems across the enterprise. By becoming a single source of analytical truth, they try to give decision-makers holistic views of the business.
4. Data formatting
One way OLTP systems optimize performance is by storing transaction data in rows, keeping a record’s data close together in physical storage, and making data retrieval faster.
Analytical queries process large amounts of data for a specific field. Doing this with row-based files would require an excessive number of reads. That’s why OLAP systems use columnar formats which keep all the data for a field close together.
5. Response times
A transaction consists of a small amount of data stored in small files. However, transactional systems operate at such large scales that any delay becomes unsupportable. That’s why OLTP systems can store and update data in milliseconds.
Response times aren’t quite as urgent for OLAP systems. An overnight processing run will not impact a big data project’s months-long schedule. Even when sourcing data from real-time systems, an OLAP database can take seconds or minutes to deliver results without anyone noticing.
6. Storage efficiency
Despite the amount of data they handle, OLTP systems have relatively light storage requirements. Their focus on real-time applications means that after a certain amount of time, the data isn’t needed anymore. OLTP systems will either flush old data or transfer it to an archive, often an OLAP database.
On the other hand, OLAP systems are storage intensive since they keep data from many sources for extended periods. As a result, OLAP storage strategies depend on efficient file formats, compression, and cost-effective scalability.
7. Why OLTP can’t handle analytics at scale
Getting up and running is a new company’s top priority, so its data stack consists primarily of transactional systems for inventory, financial, and administrative functions. It’s only natural that they would want to leverage this investment for data analytics.
As we’ve seen, there are many ways OLTP does not align with analytics use cases. Forcing a transactional system to perform more complex, multidimensional analysis risks undermining its performance and reliability.
Adding OLAP to the data stack is a more robust, performant, and capable approach. Data teams can stop forcing disparate OLTP systems to work together. Instead, they can simply build pipelines feeding data from each transactional database into a centralized analytics platform.
Online analytical processing (OLAP) use cases
Among OLAP’s strengths is its flexibility. Its data may take years to accumulate or be arriving in near real-time. Some users will have deep analytical expertise, while others will depend on dashboards. With the right design, OLAP tools can support everyday decision-making and help navigate an ever-changing business landscape. This flexibility makes OLAP ideal for varied use cases, including:
Data mining and machine learning
Data scientists develop complex queries to mine multidimensional datasets and train machine learning algorithms. OLAP systems have the performance and response times to make this iterative process practical at petabyte scales.
Risk analysis and management
Effective risk management depends on fast responses to emerging risks. OLTP systems feed financial transaction data or network activity logs to an OLAP system where near real-time workflows identify unusual behavior.
Inventory and supply chain management
Inventory and supply chains generate data from webs of internal and third-party systems. Consolidating this data in a central OLAP system gives organizations a single source of truth for managing inventory.
Demand forecasting
Traditional forecasting techniques cannot keep pace with modern selling practices. OLAP systems centralize sales across channels, geographies, and product lines, allowing companies to base forecasts on customer demand rather than historical trends.
360-degree customer analysis
Data silos prevent companies from managing the customer experience holistically. Success in industries as diverse as healthcare and e-commerce depends on unifying customer data within OLAP systems.
Business intelligence, reporting, and dashboards
Data-driven business cultures rely upon insights produced by OLAP systems. A single source for business data lets business intelligence analysts produce ad hoc reports, communicate insights through visualization tools, and develop dashboards to empower decision-makers.
Types of OLAP: What are ROLAP, HOLAP, MOLAP
Analytics platforms will adopt one of three types of OLAP architectures: Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP), or Hybrid OLAP (HOLAP).
MOLAP
MOLAP systems store data in multidimensional arrays, or data cubes, optimized for storage efficiency and query performance. ETL pipelines execute aggregation and roll-up processes to create hypercubes, collections of data processed to answer standard business questions. These multidimensional databases offer fast query speeds and efficient storage. However, their reliance on preprocessed data makes it harder to answer novel questions.
ROLAP
ROLAP systems store data in relational tables with schema designs that make canned pipelines and data cubes unnecessary. These relational databases store more detailed data than MOLAP’s data cubes, with data processing happening at query runtime. This approach is more scalable and flexible than MOLAP. However, ROLAP systems are resource-intensive since they will use more compute and require ETL pipeline development for each query.
HOLAP
The hybrid approach combines MOLAP’s speed with ROLAP’s flexibility and detailed data. However, HOLAP implementations require careful planning and data modeling to decide which data to store in MOLAP and which in ROLAP.
Is a data lakehouse suitable as an OLTP and OLAP system at the same time?
Data lakehouses support analysis services and use an OLAP structure that cannot provide the fast read/write performance of an OLTP system. The strength of a data lakehouse is its use of open-source file and table formats to manage data stored in cost-effective object storage services like Microsoft Azure’s Blob Storage.
An open-source query engine like Trino leverages these formats to provide the analytics capabilities of a data warehouse using ANSI-standard SQL. In addition, Trino’s federation capabilities let users build queries that access data stores beyond the data lakehouse, including transactional OLTP systems.
ELT and SQL queries
Starburst’s Trino-based open data lakehouse platform unifies enterprise data sources to provide a single point of access for engineers and data consumers alike. One of Starburst’s benefits is a significant reduction in the number and complexity of ETL pipelines compared to traditional OLAP databases.
Support for ANSI SQL means data scientists and analysts can build their own interactive queries without engineering support.
ETL pipelines aren’t necessary to feed dashboards and other data products. Instead, data products follow extract, load, transform (ELT) workflows that transform data at runtime.
Ingestion pipelines become much simpler since the date lakehouse stores untransformed raw data. Starburst further simplifies data management by letting rarely-used data remain at the source without sacrificing accessibility.
By replacing OLAP data warehousing solutions’ dependence on ELT pipelines with ETL workflows and SQL-based self-service usage models, Starburst open data lakehouses streamline data management and free data teams to support their organization’s data-driven decision-making.