Cloud Data Warehouse
Each data warehouse implementation style has advantages and disadvantages.
The implementation that a company chooses requires them to weigh a number of factors against each other.
Although the factors an organization cares most about will differ among organizations, common considerations include speed, control, scalability, reliability, security and governance, and cost.
Factors that influence on-premise and cloud data warehouses
Factors |
Why it matters |
Speed | Requires a balance between setup time and time-to-insights once setup is complete |
Control | Decisions about setup, implementation, and access |
Scalability | The needs of a business will change over time, seasonally, and as a company grows |
Reliability | Backing up and accessing data is critical — how often will maintenance be required? |
Security and governance | Handling data, especially personally identifiable data, requires care and complying with regulations |
Cost | Whether you pay to implement your data warehouse upfront (CaPex) or as-you-go (OpEx) depends on your implementation and affects your company’s balance sheet and tax strategy. |
Cloud data warehouse vs On-premise data warehouse
We outline some of the advantages and disadvantages of on-prem data warehouse and cloud data warehouse installations.
The comparison looks at the seven specific factors listed above. Can you see any points where one installation type or another might be more advantageous in your organization?
Factors |
On-premise data warehouse |
Cloud data warehouse |
Speed | Quicker to obtain insights if a company is in one location | Quicker setup time because the hardware doesn’t need to be set up and fewer team members to train
Quicker to obtain insights if a company is spread out but needs to transfer data among locations |
Control | Company has complete control | Some decisions are left to the cloud vendor and may not be adjustable |
Scalability | No advantages over cloud data warehouses. | Easier to scale up and down because no hardware is required |
Reliability | Depends on your team. | Depends on the cloud provider
Some level of inherent backups or disaster recovery |
Security and governance | With a strong data access policy, on-prem is most secure.
Some legal or contractual requirements may not allow for cloud providers. |
Cloud vendors have security guarantees and can restrict access to employees |
Cost | Avoid annual costs from cloud vendors
May be cheaper over time if resource procurements are carefully managed No cost to query your own data, and your data belongs to you |
Lower upfront costs because you don’t need to pay for infrastructure
Potential lower ongoing employee costs because you don’t need to hire employees with on-prem with skills to maintain and administer on-prem data warehouses Allows you to scale storage and server usage up and down when needed, which can lower costs if done correctly No need to buy hardware that is used only during peak times |
Using a data warehouse with a data lake
For better data integration, data lakes are another type of data storage that can store non-standard and unstructured data without transforming them first. One challenge of storing data in multiple types of storage is aggregating them when both will be used in an analysis.
How Starburst helps
Solutions like Starburst help immensely with disparate data sets and improve the lives of data scientists and data engineers. Starburst works in conjunction with both data warehouses and data lakes to help you query data, using sql, from any location. This helps data consumers and business users overcome barriers and optimize more accurate data-driven insights on a dashboard.