Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Use the input above to search.
Here are some suggestions:
Trino Summit is a two-day virtual conference on the 11th and 12th of December 2024. It's an event that brings together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino.
Learn moreCompanies that have moved to Apache Iceberg with Starburst
Built as an abstraction layer for Hadoop, Apache Hive was not designed with modern cloud storage in mind. This leads to issues with performance and cost as data volume grows.
Consider migrating to Apache Iceberg if you’re experiencing the following:
Query latencies increase with data volume due to the overhead of file listing. Performing DML operations requires rewriting entire files.
Retrieval costs with Apache Hive increase with data volume due to file listing. Schema changes require expensive rewrite operations.
Hive is not ACID-compliant by default and Hive ACID tables are not universally supported by major engines.
Apache Iceberg was designed with modern cloud infrastructure in mind and therefore performs better and costs less than Apache Hive. It is also designed for huge tables and can be used in production environments where a single table contains petabytes of data.
Apache Iceberg is designed for high-performance on object storage. The snapshot querying model allows the engine to read from the metadata, removing the need for costly file listing.
Optimize cloud costs with a table format that is designed for modern object storage. Apache Iceberg’s granular partitioning minimizes data scanned and allows DML operations at the row level.
Enable collaborative data workflows by guaranteeing a shared and consistent data representation with ACID-compliant versioning in Apache Iceberg.
Perform historical or root cause analysis for auditing and trend analysis with time travel in Apache Iceberg.
Easily modify data in object storage to meet GDPR or other compliance requirements with row-level DML operations.
One of the most common challenges is deciding which tables to migrate and which to leave in Apache Hive. It is important to consider the costs and benefits of migrating different workloads. Best practice is to start with highly partitioned tables accessed in latency-sensitive workloads.
Two methods exist to migrate data into Iceberg tables, and it is important to consider the pros and cons of each. The shadow migration process creates a second Iceberg table off the original Hive table. The in-place method alters the existing tables into Apache Iceberg.
Learn moreWith Apache Iceberg, it is not enough to migrate and forget your tables. You need to perform routine maintenance tasks like vacuuming, compaction, and retention to guarantee optimal performance.
Learn moreApache Iceberg tables maintain statistics about data distribution and partitioning. Use these statistics to identify potential performance bottlenecks and optimize your queries.
Get free advice from Hive to Iceberg migration experts
Query legacy Hive tables and new Iceberg tables from a single, unified engine.
Query data without worrying about the underlying table format with Great Lakes connectivity.
Schedule routine data maintenance operations on your Apache Iceberg tables with Jobs.
Continuously ingest data from Kafka-compliant topics into Apache Iceberg tables in cloud object storage.
“The combination of Starburst Galaxy and Apache Iceberg offers exceptional value, delivering far more for the same investment. It’s a clear win for efficiency and productivity in our data-driven environment.”
Data Team Lead, Kovi
“The move to Starburst and Iceberg has resulted in a 12x reduction in compute costs versus our previous data warehouse. This efficiency allows us to focus our attention on using analytics for revenue-generating opportunities.”
Sr. Data Engineer, Yello
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included