Data must be structured before it is moved into a data warehouse.
Let’s examine how data in data warehouses is typically structured, and how this structure differs from how data is structured in transactional databases. More specifically, we’ll explore schemas used in analytical databases, focusing on how schema influences database function.
Collections of data that are organized for rapid retrieval are known as databases.
In relational databases, data is organized into a schema. Think of a schema as being similar to a blueprint. It defines both the structure of the data within the database and its relation to other data. The data within a schema is organized into tables that are connected with keys.
Relational databases can use a wide variety of schemas, and the type they use determines the functions that the database performs well. Normalized schemas used in transactional systems are best for writing data from real-time transactions.
In contrast, other schemas, such as those used in analytical systems like data warehouses, are best for storing and reading large amounts of data for analysis.
These different schemas allow for a huge array of possibilities for database design, but it is critical to understand that systems optimized for processing transactions will not function well for analysis, and vice versa.
Next, we’ll explore schemas commonly used in online analytical processing(OLAP) databases, particularly in dimensional data warehouses.
Dimensional databases or OLAP databases
Business analysts need to use data that has been optimized for analysis, which means retrieving a huge number of records quickly and efficiently. However, these databases typically don’t need to be updated frequently because business analysts are working with aggregated data. Therefore, unlike in transactional databases, avoiding data redundancy is not required.
One way to efficiently store data in analytical databases is to structure the data following a dimensional database model. This storage structure differs from a normalized database design by intentionally denormalizing data so that related information is grouped in tables, even when that means tables contain dependencies or redundancies.
The tables in dimensional models include fact tables and dimension tables. Fact tables contain aggregated measurements and metric data that summarize the data a business has collected in a variety of ways. Essentially, they provide analysts with easy access to facts about the business. For example, if a business had provided a place for customers to subscribe to an email list, a fact table might store information on the number of subscribers per month and year, as well as the methods by which they subscribed. These aggregations might help data analysts determine that customers are more likely to subscribe to the email list in July than in January, which could help guide advertising efforts.
In contrast, dimension tables describe the facts contained in the fact table by providing context about those facts. Continuing the email list example from above, the dimension tables might group customers in specific ways, or provide customer location or the website pages through which customers subscribed to the email list.
In data warehouses that use a dimensional model, data is stored in both dimension and fact tables. Storing data in these tables makes it easy for business analysts to query data that is relevant to business performance and look at historical trends because the data is already stored in an aggregated manner. Dimension tables have keys that link them to fact tables so they can provide the context around the facts. Fact tables contain the majority of the data in data warehouse schemas, so they occupy most of the storage space in the data warehouse.
Related reading: 3 schemas used in data warehouses
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).