We are excited to announce that GigaOm has recognized Starburst as an Outperforming Leader for the second consecutive year in the 2024 GigaOm Radar for Data Lakes and Lakehouses. This year, GigaOm evaluated ten vendors and we are honored to be recognized as one of three vendors, in a clear leadership position.
Trusted by some of the most complex and regulated data ecosystems in the world, including 7 of the top banks, 6 of the top 10 pharma, and 4 of the top 7 US telcos, at Starburst, we continue to place our customers’ data and analytics needs at the center of our innovation strategy and continue to double down on offering an end-to-end simple and powerful platform for diverse data architectures across on-premises, hybrid, and multi-cloud.
2024 GigaOm Radar for Data Lakes and Lakehouses
Download a complimentary copy of the report
Open data lakehouse architecture: data warehouse performance and data lake flexibility
Without question, organizations are inundated with vast amounts of big data. And top data leaders know how to harness data’s through line and translate it into the organization’s bottom line. The ability to effectively manage and use all the relevant data is an organization’s competitive advantage and is becoming a requirement for staying on top.
What data management systems have these data-driven organizations leveraged that we can learn from? According to the latest Gigaom report, those who have created an optimal blend for analytics are largely because of an early adoption of data lakehouses—the best of data lakes, data warehouses, and data virtualization.
When considering an open data lakehouse approach, let’s examine what organizations must focus on.
Central to the data lakehouse ecosystem are features, such as open table formats (e.g. Apache Iceberg, Delta Lake, and Apache Hudi) and open file formats (e.g. Parquet, AVRO), which are designed to enhance data structure and improve query performance.
Open data lakehouses go beyond file and table formats to also offer the best combination of price and performance in analytics query engines built on popular open-source projects (e.g. Trino, formerly known as PrestoSQL) with added enhancements for query acceleration techniques like in-memory caching, indexing, and vector processing play pivotal roles in enabling efficient analytics across diverse and distributed data sets without necessitating extensive data transformation.
As organizations consider integrating data lakehouse solutions, aligning these technologies with existing infrastructure is crucial to avoid vendor lock-in and ensure seamless integration.
The journey toward adopting a data lakehouse approach requires intentionality and strategy: identify specific business needs, pain points, use cases, and existing systems – yes, you can leverage your data sources wherever it’s stored and limit extract, transform, and load (ETL) functions.
This foundational understanding will empower organizations to confidently navigate the data lakehouse vendor landscape and make informed decisions.
What makes the Starburst Open Data Lakehouse an outperforming leader?
At Starburst, we believe that the modern data lake will become the data center of gravity for many organizations. As such, we provide an open data lakehouse to activate the data in and around the lake. This is important, as data centralization, in all its grandeur and over-promised glory, should be optional, based on sound business decisions, not a requirement to use specific tools.
Of course, we know it’s never that simple. Intentional or not, data silos are everywhere, so Starburst is here to help. Our customers typically have their data center of gravity in a data lake (AWS, Azure, GCP, and Hadoop) but also depend on many other data sources outside the lake, which may evolve as their needs change.
For instance, modern applications often require a combination of data technologies, including historical data stored on a lake, streaming event data, log data, application data, etc. And there will always be new data types, data sources, and workloads created by different business needs to the development of new technologies and applications.
Starburst helps you access the data in the object storage and directly access data outside the lake. We can also help you move data into the lake and transform that data within your lake for better sharing and consumption.
Inside the Starburst Open Data Lakehouse, there are five core layers:
-
Data access
Connecting to all your data sources in your lake object store like HDFS, Amazon S3, ADLS, and GCP, and around your data lake(s) like Clickhouse, MongoDB, MySQL, Postgres, Salesforce, Snowflake and more. This also includes our unique Stargate connectivity, which enables customers to connect sources across regions, across clouds, and between cloud and on-prem environments.
Another essential component of data access within Starburst is the ability for customers to choose their open data table format instead of being forced into one particular standard for an optimized experience – bring your Apache Iceberg tables, Delta Lake tables, and other table metadata. This also helps reduce the burden on data engineering teams to build and maintain complex data pipelines associated with every data request. Now, instead of having a single “true” data source approach, you have a single point of access.
-
Security and governance
This layer includes all the capabilities needed to manage access, privacy, encryption, and things like monitoring and logging to adhere to regulatory requirements or your own data governance policies. This also includes leveraging third-party data catalogs such as AWS Glue, Databricks Unity Catalog, and Hive metastore.
-
Query engine
At Starburst, we offer the best combination of price and performance on the market. To deliver on that claim, the Starburst query engine uses open-source Trino, initially developed as Presto at Facebook (now Meta), with high-performance boosting optimizations, including Warp Speed (smart indexing and caching), which allows you to get query response times similar to an optimized data warehouse.
Also, as part of the engine, we offer Enhanced Fault-tolerant execution, which means long-running queries and complex transformation jobs will not fail due to out-of-memory limitations. Designed for massively parallel processing (MPP) analytics across petabytes of distributed large data sets, the Starburst query engine has been optimized to facilitate access, management, and querying of data for strategic insights. It is built for a modern enterprise that can’t compromise its compute resources’ ability to deliver on scale and concurrency.
-
Modeling / semantic layer
Finally, the Modeling and Semantic Layer includes all the features to help you build, organize, and share data models with standard SQL. Within this layer, we also offer federated Data Products, the ability to quickly discover, build, govern, analyze, and share curated data sets across the organization. This is also an area cited as a strength by GigaOm for Starburst in this year’s radar. To quote GigaOm:
“In addition to metadata management, Starburst’s data catalog features include the ability to create, govern, and share data as a product. The Data Products feature allows users to provide both technical and business-focused documentation for a dataset, link sample visualizations and dashboards to the dataset, view lineage, and describe considerations for use and target use cases. Thus, as the vendor says, it empowers business users to interact with a Starburst Data Product and to understand its context and how it is used and to share it with other internal or external users.”
-
Ecosystem of partners
We also work with an extensive network of technology and consulting partners to bring you connectors and APIs to the most popular 3rd party integrations, should you prefer a specific vendor or already have some in place. Examples include all major BI vendors, such as Tableau and Looker, raw data transformation tools like dbt, and several security and governance providers, such as Immuta, Collibra, and Alation.
What’s next?
The recognition of an outperforming platform leader for two years in a row is a testament to our obsession with innovating on behalf of our customers to liberate them to see the invisible and achieve the impossible using all their data while receiving the best combination of price and performance for their analytics.
We believe that whether businesses are trying to fuel BI or executive dashboards, enhance predictive analytics, or bring to market the next generation of AI-powered intelligent data applications, easy and secure access to all the relevant data and the ability to execute highly performant analytics will be required capabilities.
We look forward to continuing to learn fast and innovate on behalf of our customers everywhere — thank you for being on this journey with us.
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).