Starburst Galaxy’s Streaming Ingest Now in Public Preview
Share
More deployment options
Earlier this year at Data Universe 2024, we announced our fully-managed Icehouse architecture powered by Trino and Apache Iceberg, laying the groundwork for Starburst Galaxy’s open hybrid lakehouse. Today, we’re taking another step forward with Starburst Streaming Ingest, now available in Public Preview!
Building an open lakehouse isn’t just about deploying Trino and Iceberg—it’s about overcoming a maze of operational challenges that can overwhelm even the most seasoned data teams. From complex data ingestion to intricate file management, many organizations are trapped in solving technical complexities instead of driving innovation. With Starburst Galaxy, we’ve reimagined the entire process, offering point-and-click simplicity for near-real-time data ingestion into optimized Iceberg tables, a unified governance layer that ensures security without sacrificing agility, and intelligent automation of Trino clusters that takes the stress out of capacity management.
Our customers are already realizing the value of our ingest capabilities. Let’s look at how Going, an innovator in the travel industry, is using Starburst Galaxy to power predictive, real-time solutions to further deliver on their mission: to help travelers save money on flights.
Going is all-in on Starburst Streaming Ingest
Going faced a significant challenge: they needed a cost-effective way to rapidly analyze petabytes of travel data to enhance their travel recommendations. At an ingest rate of ~60GB/minute (1 GB/second), traditional batch processing methods were too slow, and scaling their optimized data warehouse would be extremely difficult, too expensive – degrading their margins. That’s when they turned to Starburst Galaxy.
By implementing Starburst Galaxy’s Streaming Ingest, Going’s engineering team streamlined their development process and reduced operational overhead. They now ingest airline booking data streams into highly optimized Iceberg tables, creating a robust data lake for their predictive models running on AWS Sagemaker. This enhanced pipeline enabled Going to offer near-real-time personalized flight and travel package recommendations, significantly improving the customer experience and anticipating notable increases in conversion rates.
“Being able to leverage the power of Starburst Galaxy and Apache Iceberg has allowed us to scale up the volume of deals we find exponentially from thousands to millions and vastly improves our ability to model on historic data to develop further predictive insights,” says Ken Pickering, VP of Engineering at Going.
The results speak for themselves:
- Reduced time to evaluate and deploy a production architecture from several months to just 2 months.
- Achieved time-to-insight 4X times faster with real-time ingestion and optimized, price-performant SQL analytics
- Improved model accuracy due to the availability of fresh, up-to-the-minute data
- Enhanced customer satisfaction through more timely and personalized travel recommendations
- Potential for additional revenue streams across new markets, thanks to more comprehensive and current data
But this is just the beginning for Going. With initial success, they’re setting their sights even higher. They aim to scale up to 50-60 petabytes of data within the next 6-9 months!
How Streaming Ingest works in Starburst Galaxy
Iceberg is taking the data world by storm and for good reason! It’s revolutionizing how we handle large-scale data lakes. But let’s be real – managing Iceberg tables on your own can be complex and resource-intensive. Our ingestion to Iceberg is a simple point-and-click, and we take care of all the behind-the-scenes orchestration and optimizations for you. By managing these crucial tasks, we free data engineers and data science teams to focus on what really matters – building innovative features and customer facing experiences that set you apart. Here’s how it works:
1. Data ingestion
Works with Apache Kafka, Confluent Cloud, or Amazon MSK.
2. Automatic transformation
Within seconds, incoming messages are ingested into Iceberg tables in your own S3 bucket. During this process, the data is automatically transformed into a relational format. This approach eliminates the need for complex ETL pipelines and significantly reduces the time and resources required for data preparation.
3. Exactly once processing
Galaxy guarantees that no messages are missed or duplicated, ensuring data integrity. This is crucial for maintaining accurate analytics and building trust in your data lake.
4. Automated Maintenance
Background jobs handle compaction, data retention, and performance optimization without manual intervention. This automation reduces the operational overhead typically associated with managing large-scale data lakes.
Starburst Galaxy is the easiest way to build and manage an icehouse
Starburst Galaxy’s Streaming Ingest drastically simplifies the process of building and operating an Icehouse architecture. With an easy setup that reduces time-to-value, automated maintenance that minimizes operational overhead, and built-in scalability, Starburst offers a unified platform that eliminates the need for multiple disparate systems. Our future-proof architecture, built on the best of open standards, ensures your Icehouse can adapt to changing data needs.
Starburst delivers a cost-effective solution that doesn’t compromise on performance. In essence, we’ve transformed the complex task of managing real-time data pipelines into a streamlined, efficient process that allows you to focus on deriving insights rather than managing infrastructure.
Want to learn more?
Whether you’re in finance, healthcare, retail, media and entertainment, or any other data-driven industry, Starburst Galaxy’s Streaming Ingest can help you turn your data into a dynamic source of real-time insights and actionable intelligence.
Here’s how to get started:
- Streaming Ingest is now in Public Preview! Log in to your Galaxy account or start for free today.
- Want a demo first? Schedule with our team to see how Streaming Ingest can address your specific use cases and data challenges.