Migrating from Hive to Iceberg

Unlock traditional data warehousing capabilities on your data lake

View tutorial

Watch webinar

Companies that have moved to Apache Iceberg with Starburst

The challenge with Apache Hive

Built as an abstraction layer for Hadoop, Apache Hive was not designed with modern cloud storage in mind. This leads to issues with performance and cost as data volume grows.

Consider migrating to Apache Iceberg if you’re experiencing the following:

Performance Impacts

Query latencies increase with data volume due to the overhead of file listing. Performing DML operations requires rewriting entire files.

Increasing Storage Costs

Retrieval costs with Apache Hive increase with data volume due to file listing. Schema changes require expensive rewrite operations.

Data Inconsistency

Hive is not ACID-compliant by default and Hive ACID tables are not universally supported by major engines.

Common Use Cases for Apache Iceberg

Apache Iceberg was designed with modern cloud infrastructure in mind and therefore performs better and costs less than Apache Hive. It is also designed for huge tables and can be used in production environments where a single table contains petabytes of data.

Mission-Critical Apps

Apache Iceberg is designed for high-performance on object storage. The snapshot querying model allows the engine to read from the metadata, removing the need for costly file listing.

Cost Reduction

Optimize cloud costs with a table format that is designed for modern object storage. Apache Iceberg’s granular partitioning minimizes data scanned and allows DML operations at the row level.

Collaborative Workflows

Enable collaborative data workflows by guaranteeing a shared and consistent data representation with ACID-compliant versioning in Apache Iceberg.

Historical Analysis

Perform historical or root cause analysis for auditing and trend analysis with time travel in Apache Iceberg.

Regulatory Compliance

Easily modify data in object storage to meet GDPR or other compliance requirements with row-level DML operations.

How-to migrate from Apache Hive to Apache Iceberg

Step One: Identify Tables to Migrate

One of the most common challenges is deciding which tables to migrate and which to leave in Apache Hive. It is important to consider the costs and benefits of migrating different workloads. Best practice is to start with highly partitioned tables accessed in latency-sensitive workloads.

Step Two: Choose Your Migration Method

Two methods exist to migrate data into Iceberg tables, and it is important to consider the pros and cons of each. The shadow migration process creates a second Iceberg table off the original Hive table. The in-place method alters the existing tables into Apache Iceberg.

Step Three: Maintain Your Tables

With Apache Iceberg, it is not enough to migrate and forget your tables. You need to perform routine maintenance tasks like vacuuming, compaction, and retention to guarantee optimal performance.

Step Four: Optimize Your Queries

Apache Iceberg tables maintain statistics about data distribution and partitioning. Use these statistics to identify potential performance bottlenecks and optimize your queries.

Need help with your migration?

Get free advice from Hive to Iceberg migration experts

The Starburst advantage

Complete Multi-Table Support

Query legacy Hive tables and new Iceberg tables from a single, unified engine.

Great Lakes Connectivity

Query data without worrying about the underlying table format with Great Lakes connectivity.

Data Optimization

Schedule routine data maintenance operations on your Apache Iceberg tables with Jobs.

Streaming Ingest

Continuously ingest data from Kafka-compliant topics into Apache Iceberg tables in cloud object storage.

Try Starburst Galaxy today

Start free

Hear what our customers say

“The combination of Starburst Galaxy and Apache Iceberg offers exceptional value, delivering far more for the same investment. It’s a clear win for efficiency and productivity in our data-driven environment.”

Johni Michels

Data Team Lead,

Kovi

“The move to Starburst and Iceberg has resulted in a 12x reduction in compute costs versus our previous data warehouse. This efficiency allows us to focus our attention on using analytics for revenue-generating opportunities.”

Peter Lim

Sr. Data Engineer,

Yello

Hive to Iceberg with Galaxy

Get started with a step-by-step tutorial on migrating your Apache Hive tables to Apache Iceberg

Hive vs Iceberg: Choosing the right table format

Learn the basic differences between Apache Hive and Apache Iceberg and when you should migrate.

Why don't data lakehouses use Hive?

This video explains how Hive is used in traditional data lakes, and how this relates to other lakehouse technologies like Iceberg, Delta Lake, and Hudi that help make the data lakehouse possible.

Starburst’s mission is to free our customers to see the invisible and achieve the impossible