Many companies choose to use data warehouses, either on their own or alongside other data solutions. Whether or not a data warehouse is the best solution for a particular use case depends on the specifics of that use case.
Some things to consider when using a data warehouse:
- How much your company wants to invest in infrastructure
- The types of questions and business processes your company needs to answer, and how much those questions may change in the future
- Where your company is in its data analysis journey
- Deciding what data needs to be in a central repository
What are the benefits of data warehouses?
- Unlike data mining, data warehouses allow data consumers to quickly and efficiently access data after it has been loaded in.
- The data in data warehouses can be queried by end users of many different skill levels because it is structured in a pre-defined schema.
What are the challenges of data warehouses?
1. The data in data warehouses must be structured
To achieve this, it must be processed before it can be loaded into the data warehouse. This can be both time and resource intensive.
2. Data warehouses typically hold historical data
However, this can lead to data warehouses becoming so large that the storage costs become too expensive to justify. This may lead to older historical data being discarded even though it might still have some value.
3. Data warehouses must be designed before they are built
This means that they are not flexible for new use cases that might occur after they are created.
4. Single source of truth
However, there are always new sources of data. Given the schema-on-write nature of a data warehouse, there is significant effort required to add new data into a data warehouse. This constant battle between new data sources coming in and the effort needed to add them means a data warehouse rarely achieves true “single source of truth” status.
5. Data warehouses do not work well with all data types
For example, video content, audio content, and data contained in document form are not amenable to data warehouse storage.
Related reading: Unstructured data
Types of data stored in a data warehouse
Some types of data lend themselves well to storage within a data warehouse. For example, financial transaction data, operational data, customer relationship data, and enterprise resource planning data are typically stored in a data warehouse.
However, organizations don’t typically store all of the data they collect in a data warehouse. To do so would be cost-prohibitive in terms of both volume and database administrator bandwidth.
Social media data, documents, and sensor data are some examples of unstructured data that might not be stored in data warehouses because they cannot be easily consolidated or structured. Data of this type is typically handled by other technologies, such as data lakes or data lakehouses, that do not restructure data before it is stored.
Some organizations use a data warehouse as their only analytical data repository. In these organizations, data analysts would only have access to data stored in a data warehouse. This could be limiting because data warehouses might not store all of the data the organization collects.
Whether or not this is a problem depends on the questions that the organization needs to answer. If new questions need to be answered or new data becomes available, it can be difficult to adjust the data warehouse. If this is a problem, the organization might consider using a data lake alongside its data warehouse or using a data lakehouse to improve the lifecycle of their data.
Related reading: Open source data warehouse
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).