Starburst + Confluent Tableflow
Toni Adams
SVP Partner and Alliances
Starburst
Toni Adams
SVP Partner and Alliances
Starburst


More deployment options
At Starburst, we’re excited about Apache Iceberg and its future impact on Analytics and AI workloads. That’s why we designed the Icehouse architecture, combining the power of Trino + Iceberg. The result is an open data architecture that lets you power the foundation of your data world–whether that’s data analytics, data applications, or AI workloads.
We’re also excited about connecting Kafka streaming ingest to the world of Iceberg. It’s why we created Starburst Galaxy streaming ingest, which helps move raw Apache Kafka data to Iceberg tables, providing transformation and data governance along the way.
Announcing Starburst support for Confluent Tableflow
Today, there’s one more reason to love Starburst + Iceberg as the foundation of your data architecture. We’re happy to announce that Starburst is supported by Confluent Tableflow. This extends our commitment to solving the long-standing challenge in data engineering of turning streaming data into queryable datasets without bespoke ETL pipelines—a challenge set to become even more important with the growth in AI workloads.
In recent years, two separate spheres have emerged in the data world–open table formats like Iceberg and Delta Lake, organized in tables–and raw streaming data organized in topics. But these two worlds don’t always work easily together. For data to be useful for analytics or AI, it needs to be structured, and that means turning it into tables. Moving Kafka topics into structured, queryable tables has not been easy, which is why Starburst built streaming ingest. The data needs to be properly mapped, converted, and cleaned before it is usable, and this process has often been fragile and error-prone.
Starburst is happy to announce another option – Tableflow.
What is Tableflow?
Confluent Tableflow is designed to move Kafka streaming data to data lakes and data warehouses, storing it in Iceberg or Delta Lake tables. It another option for users accessing operational data for data analytics and AI workflows using Starburst. It’s available today on AWS and is coming soon to Azure and Google Cloud Platform (GCP).
Tableflow works by making streaming data from Confluent Cloud available to analytics, data applications, and AI using Iceberg. Using this approach eliminates the complexity often inherent to managing streaming data, mapping those topics to schemas in Iceberg or Delta Lake. This is one other pathway to make managing disconnected data even easier.
How Tableflow maps Kafka topics to Iceberg tables
How does Tableflow work? To make data useful for real-time analytics or real-time AI, it needs to be cleaned, prepared, and optimized for querying before being stored in a data lakehouse. Cleansing and preparing high-throughput streaming data is done by Apache Flink® and data optimization for analytical workloads is achieved with Confluent’s Tableflow.
How Tableflow maps Kafka topics to the Data Lakehouse
Tableflow maps data from raw Kafka topics to Iceberg or Delta Lake tables using the following process. This helps solve the problem of fragile pipelines and creates usable data without the risk of duplication or the need for cleanup.
- Data Conversion—Converts Kafka segments and schemas in Avro, JSON, or Protobuf into Iceberg and Delta-compatible schemas and parquet files, using the Schema Registry in Confluent Cloud as the source of truth.
- Schema Evolution – Tableflow automatically detects schema changes such as adding fields or widening types and applies them to the respective table.
- Catalog Syncing – Sync Tableflow-created tables as external tables in AWS Glue, Snowflake Open Catalog, Apache Polaris, and Unity Catalog (coming soon).
- Table Maintenance and Metadata Management – Tableflow automatically compacts small files when it detects enough of them and also handles snapshots and version expiration.
- Choose your storage—You can store the data in your own Amazon S3 bucket or let Confluent host and manage the storage for you.
The result is Kafka data that is transformed–made useable for analytics, data applications, or AI workloads. All of this happens without duplication, without switching pipelines, and without fragile data pipelines handling delicate streaming data. Instead, the data you need becomes instantly accessible to your analytics and AI tooling.
Starburst’s expanding commitment to Iceberg with Tableflow
Like Starburst streaming ingest, Tableflow allows you to map streaming data to structured data using Iceberg. Once mapped to Iceberg, this data becomes available to analytics, data applications, or AI.
Starburst-Tableflow Iceberg REST Catalog integration
Once Tableflow has completed the mapping process, users can analyze their data using the data platform of their choice, including Amazon Athena, Databricks, Snowflake, Dremio, Imply, Onehouse, and Starburst. The Iceberg REST catalog facilitates this integration, creating a powerful, versatile connection and serving as the authoritative repository for all Iceberg tables generated by the platform.
See the image below for more information on the data architecture of this integration.
Starburst streaming ingest and Icehouse Architecture
For Starburst, this represents yet another commitment to an Iceberg-based Icehouse architecture. Tableflow makes the world of Kafka data accessible to Iceberg in new ways, and Starburst is there to help facilitate and enable these workflows. This adds to and extends our formidable capabilities already present in the form of Starburst Galaxy streaming ingest.
In this, we double down on our belief that the future of data is bound up with the future of Apache Iceberg and other data lakehouse open table formats. The future of data is not only in analytics but also in data applications and AI workloads.
All of these beliefs are represented in our integration with Tableflow, and we are excited to work more closely with Confluent and other partners to help make all data accessible to our single foundation for all your data workloads.