Starburst + Confluent Tableflow

Bringing Apache Iceberg and Apache Kafka® together
  • Toni Adams

    Toni Adams

    SVP Partner and Alliances

    Starburst

Share

Linkedin iconFacebook iconTwitter icon

At Starburst, we’re excited about Apache Iceberg and its future impact on Analytics and AI workloads. That’s why we designed the Icehouse architecture, combining the power of Trino + Iceberg. The result is an open data architecture that lets you power the foundation of your data world–whether that’s data analytics, data applications, or AI workloads

We’re also excited about connecting Kafka streaming ingest to the world of Iceberg. It’s why we created Starburst Galaxy streaming ingest, which helps move raw Apache Kafka data to Iceberg tables, providing transformation and data governance along the way.

Announcing Starburst support for Confluent Tableflow 

Today, there’s one more reason to love Starburst + Iceberg as the foundation of your data architecture. We’re happy to announce that Starburst is supported by Confluent Tableflow. This extends our commitment to solving the long-standing challenge in data engineering of turning streaming data into queryable datasets without bespoke ETL pipelines—a challenge set to become even more important with the growth in AI workloads.

Quote by Confluent chief product officer Shaun Clowes stating "With Tableflow, we’re bringing our expertise of connecting operational data to the analytical world. Now, data scientists and data engineers have access to a single, real-time source of truth across the enterprise, making it possible to build and scale the next generation of AI-driven applications."

In recent years, two separate spheres have emerged in the data world–open table formats like Iceberg and Delta Lake, organized in tables–and raw streaming data organized in topics. But these two worlds don’t always work easily together. For data to be useful for analytics or AI, it needs to be structured, and that means turning it into tables. Moving Kafka topics into structured, queryable tables has not been easy, which is why Starburst built streaming ingest. The data needs to be properly mapped, converted, and cleaned before it is usable, and this process has often been fragile and error-prone. 

Starburst is happy to announce another option – Tableflow. 

What is Tableflow?

Confluent Tableflow is designed to move Kafka streaming data to data lakes and data warehouses, storing it in Iceberg or Delta Lake tables. It another option for users accessing operational data for data analytics and AI workflows using Starburst. It’s available today on AWS and is coming soon to Azure and Google Cloud Platform (GCP). 

Tableflow works by making streaming data from Confluent Cloud available to analytics, data applications, and AI using Iceberg. Using this approach eliminates the complexity often inherent to managing streaming data, mapping those topics to schemas in Iceberg or Delta Lake. This is one other pathway to make managing disconnected data even easier.

 

How Tableflow maps Kafka topics to Iceberg tables 

How does Tableflow work? To make data useful for real-time analytics or real-time AI, it needs to be cleaned, prepared, and optimized for querying before being stored in a data lakehouse. Cleansing and preparing high-throughput streaming data is done by Apache Flink® and data optimization for analytical workloads is achieved with Confluent’s Tableflow.

How Tableflow maps Kafka topics to the Data Lakehouse  

Tableflow maps data from raw Kafka topics to Iceberg or Delta Lake tables using the following process. This helps solve the problem of fragile pipelines and creates usable data without the risk of duplication or the need for cleanup. 

  1. Data Conversion—Converts Kafka segments and schemas in Avro, JSON, or Protobuf into Iceberg and Delta-compatible schemas and parquet files, using the Schema Registry in Confluent Cloud as the source of truth.
  2. Schema Evolution – Tableflow automatically detects schema changes such as adding fields or widening types and applies them to the respective table.
  3. Catalog Syncing – Sync Tableflow-created tables as external tables in AWS Glue, Snowflake Open Catalog, Apache Polaris, and Unity Catalog (coming soon).
  4. Table Maintenance and Metadata Management – Tableflow automatically compacts small files when it detects enough of them and also handles snapshots and version expiration.
  5. Choose your storage—You can store the data in your own Amazon S3 bucket or let Confluent host and manage the storage for you.

The result is Kafka data that is transformed–made useable for analytics, data applications, or AI workloads. All of this happens without duplication, without switching pipelines, and without fragile data pipelines handling delicate streaming data. Instead, the data you need becomes instantly accessible to your analytics and AI tooling. 

 

Starburst’s expanding commitment to Iceberg with Tableflow

Like Starburst streaming ingest, Tableflow allows you to map streaming data to structured data using Iceberg. Once mapped to Iceberg, this data becomes available to analytics, data applications, or AI. 

Starburst-Tableflow Iceberg REST Catalog integration

Once Tableflow has completed the mapping process, users can analyze their data using the data platform of their choice, including Amazon Athena, Databricks, Snowflake, Dremio, Imply, Onehouse, and Starburst. The Iceberg REST catalog facilitates this integration, creating a powerful, versatile connection and serving as the authoritative repository for all Iceberg tables generated by the platform.

See the image below for more information on the data architecture of this integration. 

Image depicting confluent tableflow integration, including integration between Tableflow and Starburst

Starburst streaming ingest and Icehouse Architecture

For Starburst, this represents yet another commitment to an Iceberg-based Icehouse architecture.  Tableflow makes the world of Kafka data accessible to Iceberg in new ways, and Starburst is there to help facilitate and enable these workflows. This adds to and extends our formidable capabilities already present in the form of Starburst Galaxy streaming ingest.

In this, we double down on our belief that the future of data is bound up with the future of Apache Iceberg and other data lakehouse open table formats. The future of data is not only in analytics but also in data applications and AI workloads. 

All of these beliefs are represented in our integration with Tableflow, and we are excited to work more closely with Confluent and other partners to help make all data accessible to our single foundation for all your data workloads.