Icehouse - the next era of the data lakehouse
Leader in Big Data Processing & Distribution
Highest User Adoption for Enterprise Big Data Analytics
What is an Icehouse: the newest Lakehouse Architecture
An Icehouse is a specific type of data lakehouse with Trino as the SQL query engine and Apache Iceberg as the table format. At Starburst, we believe that an Icehouse is the only data architecture that provides teams with a familiar warehouse-like experience on a truly open foundation. Starburst is happy to introduce the “Starburst Icehouse,” providing automations to Starburst Galaxy that will continue to make it easier for teams to build and manage an Icehouse architecture, including specific Icehouse features like data ingestion, data governance, data management and automatic capacity management.
The Next Era of the End-to-End Data Lakehouse
Data Warehouse vs. Data Lakehouse
Understand the benefits of an Icehouse and see why you can get warehouse-like performance for a lower cost.
TBs-PBs
PBs+
Structured
All (Structured, Semi-Structured, Unstructured)
High
High
High
High
$$$
$$ (Separation of Storage & Compute)
No
Yes
Business Intelligence & Reporting; Workloads
Business Intelligence & Reporting; Data Applications; Data Science; Machine Learning
The Benefits of Adopting an Icehouse Architecture Built With Trino & Apache Iceberg
Optimized for Big Data Analytics
Improve the scalability and responsiveness of your architecture with the lakehouse that’s proven at petabyte scale
Get Speed Without Increased Costs
Achieve data warehouse performance with a more scalable architecture without the added costs
Leverage Cutting Edge Innovation
Adopt the technology that revolutionized Netflix, Apple, Shopify, Stripe
Comparing Icehouse Data Tables
Why Apache Iceberg is the best table format vs. Databricks Delta Lake or Apache Hive
Full
Full
Only w/Hive ACID
Parquet, ORC, Avro
Parquet
Parquet, ORC, Avro
Full
Limited
(Only supports adds/reorders of columns)
Limited
(No guarantees of correctness)
Yes
No
No
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
Growing
Growing
Established
Interoperable
Tight integration with Databricks
Interoperable
General purpose data lakehouses
Optimized for Databricks data lakehouses
General purpose data lake w/limited DML support
Create the Hyperscale architecture you have always dreamed of
Data warehousing solutions simply can’t scale to big data
- Storage: Separation of storage and compute that supports independent scaling
- Processing: Trino is multi-parallel processing engine that supports high concurrency
- Table Format: Iceberg built for cloud storage with decoupled metadata that supports large tables
Achieve industry-leading price performance for SQL workloads
Data warehousing solutions simply can’t scale to big data
- Performance: Achieve the same performance as your data warehouse while optimizing your spend
- Costs: Expect 4X cost savings over time
- Risk: Remove any risk of having your data restricted with 0% of the lock-in of a data warehouse
Use a familiar SQL interface
The same SQL interface you’ve been used to working with
- Support: Ensure you have the right support for DML statements and your table needs
- Compliance: Guaranteed ACID-compliance so that all database transactions are completed easily
- Schema Evolution: Provided schema evolution will allow you to easily modify your database without disruption
The perfect data architecture without the hassle
All on a fully-managed platform with end-to-end data pipeline support from ingestion to data sharing
- Ingestion: Using unreliable data ingestion can create complications with data accuracy, can increase complexities with data analysis, which can ultimately lead to data unreliability.
- Governance: Simplify data governance. With the right architecture, you can eliminate the need to integrate a whole separate governance system.
- Analyze: Execute SQL queries on Iceberg tables fast with advanced performance optimization tools