×

Tag: Data Lake

Showing 73 results

What is a data lake?

What is a data lake?

November 12, 2024

This article defines what data lakes are, why they are important, and how they compare to other big data storage technologies, including:  Databases Data...

Automated Table Maintenance for Apache Iceberg Tables

Automated Table Maintenance for Apache Iceberg Tables

July 18, 2024

Table maintenance is necessary for Apache Iceberg tables in order to keep your data optimized and performant. That extra effort is worth the reward...

Introducing AI-powered data lake analytics in Starburst Galaxy

Introducing AI-powered data lake analytics in Starburst Galaxy

November 28, 2023

Generative AI took the world by storm in 2023 and there has been a tremendous amount of hype around the possibilities it brings to...

BestSecret’s data journey: Moving beyond Snowflake

BestSecret’s data journey: Moving beyond Snowflake

August 16, 2023

Of all the seventy-plus speakers at the festival, there was one presentation that I found to be particularly interesting – and not because the speaker also happens to be our customer. That presentation was from Lutz Künneke, Director of Engineering, and Isa Inalcik, Senior Data Engineer, at BestSecret, a leading European online destination for off-price fashion based near Munich, Germany. As Künneke got to the stage, the first words out of his mouth were: “We are moving off of Snowflake.” 

GigaOm TCO report: Starburst data lakehouse enables 3x faster time to insight at half the cost

GigaOm TCO report: Starburst data lakehouse enables 3x faster time to insight at half the cost

August 16, 2023

In a new report Cloud Data Warehouse vs. Cloud Data Lakehouse: A Snowflake vs. Starburst TCO and Performance Comparison, published by GigaOm, concluded that a Starburst lakehouse architecture could achieve superior price-performance and significantly faster time-to-insight at a much lower total cost of ownership (TCO).

Testing the boundaries of partitioning for data lake analytics

Testing the boundaries of partitioning for data lake analytics

July 27, 2023

Discover how Starburst’s nanoblock indexing accelerates data lake analytics, optimizing queries, and reducing data reads. Try it in Starburst Galaxy for accelerated performance!

Starburst data lake certification and training

Starburst data lake certification and training

July 24, 2023

Data analytics certification program to learn about topics such as data lakes and data lakehouses, and modern table formats like Apache Iceberg.

Data pipelines and data lakes: Transforming raw data into actionable insights

Data pipelines and data lakes: Transforming raw data into actionable insights

July 20, 2023

ETL operates as the engine behind the data pipeline process, moving data from a raw state to a consumable one. Let’s unpack the way in which this typically operates in a modern data lake or data lakehouse. Later, we’ll take a tour to see how Starburst Galaxy fits in this picture and how it can be used to construct the Land, Structure and Consume layers typical of a modern data lake.

Google Looker and Starburst Galaxy: Modern, trusted BI for your modern data lake

Google Looker and Starburst Galaxy: Modern, trusted BI for your modern data lake

June 20, 2023

With the Looker and Starburst Galaxy integration, teams can now extend Looker beyond data in Google Cloud services like BigQuery to other cloud data sources – including data in AWS and Azure. This means that Looker can now support customers with multi-cloud environments.

Designing a data lake and analytics architecture

Designing a data lake and analytics architecture

June 6, 2023

Of all the choices a startup has to make in its early stages, deciding on the right data analytics architecture might not seem critical,...

Accelerate AI with a data lake analytics platform

Accelerate AI with a data lake analytics platform

May 22, 2023

A data lake analytics platform is needed in order to bridge the gap between what can be a large number of analytical AI tools with data lakes, lakehouses, legacy systems and other technologies in the ecosystem. 

BCG landmark research: Spiraling data costs and complexity reach a tipping point

BCG landmark research: Spiraling data costs and complexity reach a tipping point

May 1, 2023

The number of unique data vendors has grown, tripling  in the past decade (from about 50 to close to 150 today), driven in a large part by massive data stack investments, which total about $245 billion between 2012 to 2021.

Fueling Trino large-scale geospatial analysis with Starburst Warp Speed

Fueling Trino large-scale geospatial analysis with Starburst Warp Speed

March 27, 2023

In our last post, we discussed two methods for running geospatial analysis with Trino and the Hive connector and explored a few optimization techniques...

Lie #3 — You’re ready for the AI + ML deep end

Lie #3 — You’re ready for the AI + ML deep end

February 3, 2023

You’ve hired pedigreed data scientists and engineers, invested in shiny new software, and perhaps even reorganized your entire business, all in the hopes of...

Lie #1 — A single source of truth

Lie #1 — A single source of truth

February 1, 2023

Technology vendors have long peddled a version of nirvana where all of a company’s data would be centralized in one location.  The “single source...

Simplified Cloud Storage Governance with Starburst and Immuta

Simplified Cloud Storage Governance with Starburst and Immuta

January 4, 2023

Accessing data in cloud storage has been an ongoing challenge for analysts, data engineers, and organizations as a whole. Additional work is required to...

Over 80 Data & Analytics Statistics, Data, Trends, and Facts

Over 80 Data & Analytics Statistics, Data, Trends, and Facts

December 28, 2022

Most organizations have data and continue to generate and collect it on a daily basis, but have a far more difficult time in getting...

4 tendances data à suivre en 2023

4 tendances data à suivre en 2023

December 19, 2022

Par Martial Coiffe & Victor Coustenoble 2022 nous a confirmé que l’architecture data demeure au cœur des préoccupations des entreprises et organisations en France,...

Tableau Cloud + Starburst: New Connector Supports Shift to Cloud-based SaaS

Tableau Cloud + Starburst: New Connector Supports Shift to Cloud-based SaaS

December 19, 2022

The shift to cloud-based software-as-a-service platforms is accelerating in just about every tech industry. So it wasn’t much of a surprise to the analytics...

Apache Iceberg Time Travel & Rollbacks in Trino

Apache Iceberg Time Travel & Rollbacks in Trino

December 7, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

How data and schema interact with a data lake and data warehouse

How data and schema interact with a data lake and data warehouse

December 6, 2022

After years of building enterprise data warehouses, at first glance, a data lake architecture may appear to be similar to a data warehouse. After...

Building and governing a data mesh with Starburst and AWS Lake Formation

Building and governing a data mesh with Starburst and AWS Lake Formation

November 29, 2022

The increasing popularity of data lakes isn't surprising anyone in the analytics space. The appeal of importing data from multiple sources into a single...

Apache Iceberg Schema Evolution in Trino

Apache Iceberg Schema Evolution in Trino

November 22, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Reliving the Hype: Highlights from Trino Summit 2022

Reliving the Hype: Highlights from Trino Summit 2022

November 18, 2022

Last week in San Francisco was one for the Trino history books. After three years of planning, rescheduling, planning, and rescheduling some more, Starburst...

Apache Iceberg DML (update/delete/merge) & Maintenance in Trino

Apache Iceberg DML (update/delete/merge) & Maintenance in Trino

November 17, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Explore A New Way Of Utilizing A Data Lakehouse

Explore A New Way Of Utilizing A Data Lakehouse

November 10, 2022

A data lakehouse combines the principles of a data lake and a data warehouse to include the best of both worlds. Data lakehouses are...

Iceberg Partitioning and Performance Optimizations in Trino

Iceberg Partitioning and Performance Optimizations in Trino

November 8, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

6 Considerations for Choosing the Right Cloud Data Lake Solution

6 Considerations for Choosing the Right Cloud Data Lake Solution

October 26, 2022

Data lakes have amazing attributes. For one, it enables us to handle vast, complex datasets. Data lakes offer an up-to-date stream of data that...

Data lake vs Data Virtualization

Data lake vs Data Virtualization

October 13, 2022

Data lakes deliver unprecedented agility A data lake is an essential tool for big data analytics. A key advantage of developing a data lake...

Second Edition of Trino: The Definitive Guide

Second Edition of Trino: The Definitive Guide

October 5, 2022

Starburst has played a key role in the Trino community for a long time now. We contribute  to the success of Trino every day....

Building Reporting Structures on S3 using Starburst Galaxy and Apache Iceberg

Building Reporting Structures on S3 using Starburst Galaxy and Apache Iceberg

October 4, 2022

AWS S3 has become one of the most widely used storage platforms in the world. Companies store a variety of data on S3 from...

The Data Virtualization Evolution is Just Beginning

The Data Virtualization Evolution is Just Beginning

October 4, 2022

Data virtualization revolutionized the data infrastructure space by serving data consumers directly on top of data stores, without the need to move data elsewhere....

Delivering Text Search Capabilities Directly on the Data Lake with Starburst

Delivering Text Search Capabilities Directly on the Data Lake with Starburst

September 29, 2022

In the big data analytics world, enabling analytics on unstructured text is a powerful capability. For that reason, it would be of use that...

Five Exciting Big Data Trends Worth Taking a Closer Look

Five Exciting Big Data Trends Worth Taking a Closer Look

September 20, 2022

After Covid-19, many business executives faced one of the toughest leadership tests to turn this challenge into an amazing opportunity. What did the business...

Rethinking SIEM Solutions

Rethinking SIEM Solutions

September 13, 2022

As organizations strive to become more agile, there has been a mass movement jumping headfirst into what is called a security data lake. Gartner...

The Difference Between Micro-Partitioning vs. Indexing and a Better Way

The Difference Between Micro-Partitioning vs. Indexing and a Better Way

September 8, 2022

When optimizing your analytics database performance, one of the most important decisions is to choose how data is stored and accessed. There are two...

Data Lake Solutions Foster a Range of Analytics Use Cases

Data Lake Solutions Foster a Range of Analytics Use Cases

August 31, 2022

Data lakes enable the implemention of a wide range of solutions, including raw data collection, flexible data access for users, and building fast and...

Identify threats faster with a security data lake

Identify threats faster with a security data lake

August 26, 2022

The glory days of SIEM are over. Security teams are not only measured by their ability to collect as much data as possible, but...

Scaling Up: When to Migrate from PostgreSQL to a Data Lake

Scaling Up: When to Migrate from PostgreSQL to a Data Lake

July 13, 2022

One of the true pillars of the tech revolution, PostgreSQL is an OLTP database designed primarily to handle transactional workloads. The technology has been...

Starburst Acquires Varada To Deliver Faster (and Cheaper) Data Lake Analytics

Starburst Acquires Varada To Deliver Faster (and Cheaper) Data Lake Analytics

June 23, 2022

I’m excited to announce the acquisition of Varada, a data analytics accelerator, based out of Tel Aviv, Israel. Varada offers a data lake analytics...

Employee Perspective: Accelerating Data-Driven Insights in AdTech

Employee Perspective: Accelerating Data-Driven Insights in AdTech

June 16, 2022

Before I joined Starburst, I worked in the AdTech industry where companies buy and sell user data for online targeting advertisement campaigns or ML/AI-based...

Data Lake Analytics for Smart, Modern Data Management

Data Lake Analytics for Smart, Modern Data Management

May 27, 2022

Best-in-class organizations need fast, reliable data analytics that enable business leadership to identify patterns and key insights that will help them predict the best...

The Past, Present, and Future of Trino

The Past, Present, and Future of Trino

May 24, 2022

Recently, I had the pleasure of chatting with Ravit Jain on his show “The Ravit Show” to discuss the evolution of Trino and where...

Starburst and Databricks Collaborate on the Trino Delta Lake Connector

Starburst and Databricks Collaborate on the Trino Delta Lake Connector

March 24, 2022

This blog was co-authored by Claudius Li, Product Manager at Starburst, and Joe Lodin, Information Engineer at Starburst. Starburst recently donated the Delta Lake...

The Benefits of a Big Data SQL Query Engine

The Benefits of a Big Data SQL Query Engine

February 16, 2022

So why use a big data SQL query engine? Well, have you suffered from the following problems with processing and analyzing big data via...

Top 6 Reasons to Migrate to the Cloud

Top 6 Reasons to Migrate to the Cloud

January 25, 2022

Starburst released the 2021 State of Data market research report, conducted by Enterprise Management Associates (EMA), in collaboration with Red Hat, early last year....

Starburst Stargate: One Cluster to Rule Them All

Starburst Stargate: One Cluster to Rule Them All

December 9, 2021

I think of Starburst Stargate as the Lord of the Rings feature. Or the galactic empire feature. In a prior blog post, I introduced...

Data warehouse vs Lake vs Lakehouse architecture

Data warehouse vs Lake vs Lakehouse architecture

December 6, 2021

As companies shift their analytical ecosystems from on-premise to cloud and try to avoid “data lock-in”, we’re noticing some very interesting data patterns. This...

Tableau is Just Better with Starburst

Tableau is Just Better with Starburst

November 15, 2021

I’m one of those strange people who has always enjoyed doing performance testing. The thought of spinning up lots of machines to do my...

The Analytics Engine for Distributed Data

The Analytics Engine for Distributed Data

October 1, 2021

The idea of a single source of truth has been around since the beginning of big data. However, over the years, through the data...

Data Mesh: Embracing Decentralized Data Paradigms

Data Mesh: Embracing Decentralized Data Paradigms

September 20, 2021

Many data and analytics practitioners have heard about this socio-technical paradigm shift, Data Mesh, and would like to learn more. But before describing what...

Dynamic Filtering: Supporting High Speed Access to Data

Dynamic Filtering: Supporting High Speed Access to Data

September 20, 2021

Analysts are often tasked with deriving insights for business units where the data can span multiple locations.  This is increasingly true today when the...

Accelerating Data Science with Trino

Accelerating Data Science with Trino

August 31, 2021

At our Datanova for Data Scientists conference on July 14, I held a discussion with Dain Sundstrom and David Philips, CTOs of Starburst, about...

Hybrid Distributed Data Store and RDBMS

Hybrid Distributed Data Store and RDBMS

August 12, 2021

As companies shift their analytical ecosystems from on-premise to cloud and try to avoid “data lock-in”, we’re noticing some very interesting data patterns. This...

Moving On-Premise Data to Azure Cloud ADLS

Moving On-Premise Data to Azure Cloud ADLS

August 4, 2021

Microsoft has migrated thousands of customers to its Azure cloud platform and has quickly become the second most popular cloud provider. Companies have easily...

Trino on Ice IV: Deep Dive Into Iceberg Internals

Trino on Ice IV: Deep Dive Into Iceberg Internals

June 8, 2021

Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice...

Starburst Supports Launch of Delta Sharing, the First Open Protocol for Secure Data Sharing

Starburst Supports Launch of Delta Sharing, the First Open Protocol for Secure Data Sharing

May 26, 2021

At Starburst, we believe in building optionality into your data architecture & strategy. To us, optionality means building for flexibility so that you don’t...

Trino on Ice III: Iceberg Concurrency Model, Snapshots, and the Iceberg Spec

Trino on Ice III: Iceberg Concurrency Model, Snapshots, and the Iceberg Spec

May 25, 2021

Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice...

Trino on Ice II: In-Place Table Evolution and Cloud Compatibility with Iceberg

Trino on Ice II: In-Place Table Evolution and Cloud Compatibility with Iceberg

May 11, 2021

Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice...

Trino On Ice I: A Gentle Introduction To Iceberg

Trino On Ice I: A Gentle Introduction To Iceberg

April 27, 2021

Trino on ice I: A gentle introduction to Iceberg Trino on ice II: In-place table evolution and cloud compatibility with Iceberg Trino on ice...

Understanding the Starburst and Trino Hive Connector Architecture

Understanding the Starburst and Trino Hive Connector Architecture

February 18, 2021

After a decade of running Hive queries on their data lakes, many companies are astonished at the speeds in which they are able to...

The Future of Analytics: In Conversation With Matt Fuller

The Future of Analytics: In Conversation With Matt Fuller

February 5, 2021

Datanova is just next week. More than 2,000 data and analytics leaders will join us to learn more about how to unlock the value...

6 Reasons to Attend Datanova 2021: #2, The Oxford Debate

6 Reasons to Attend Datanova 2021: #2, The Oxford Debate

January 25, 2021

Datanova 2021 is going to have plenty of panels and informative content for anyone interested in the future of big data management. We're also...

Top 10 Reasons to Migrate from EMR Trino to Starburst Enterprise

Top 10 Reasons to Migrate from EMR Trino to Starburst Enterprise

November 13, 2020

In today’s data architecture economy, there are no shortages of options when it comes to choosing various distributions and deployment strategies for a given...

The Death of Apache Drill

The Death of Apache Drill

August 6, 2020

One of the things that really drew me to and got me excited about Trino over 4 years ago was that it wasn’t tied...

Presto & Data Science: Getting Data Into the Hands of Data Scientists (Faster)

Presto & Data Science: Getting Data Into the Hands of Data Scientists (Faster)

June 26, 2020

A few days ago I read a Gartner report stating that data scientists spend 23% of their time on data collection and preparation. I...

How a Telecommunications Giant Established Universal Data Access

How a Telecommunications Giant Established Universal Data Access

April 3, 2020

  Our customer base has been growing quickly, and we’re excited to share a case study highlighting one of our largest clients, a telecommunications...

The 4 Stages to Big Data Nirvana (In the Cloud)

The 4 Stages to Big Data Nirvana (In the Cloud)

July 18, 2019

Nirvana - a state of perfect happiness; an ideal or idyllic place.  In big data “Nirvana” is a wishlist of items: The ability to...

Starburst Enterprise & Databricks Delta Lake Support

Starburst Enterprise & Databricks Delta Lake Support

June 13, 2019

TL;DR - There is now Starburst Enterprise Databricks Delta Lake compatibility.   Delta Lake The big data ecosystem has many components but the one...

The Art of Abstraction: the continuing separation of compute and storage for data analytics

The Art of Abstraction: the continuing separation of compute and storage for data analytics

December 4, 2018

We recently invited 451 Research VP, Matt Aslett to share his thoughts and observations on the practice of separating the storage and computation of...

Data Lakes without Hadoop

Data Lakes without Hadoop

May 14, 2018

It seems like migrating to the cloud has dominated the news and a lot of companies are shuttering their data centers and letting cloud...

Building data lakes using AWS S3 object storage

Building data lakes using AWS S3 object storage

February 21, 2018

With Amazon’s Simple Storage Service (Amazon S3), the object storage solution from Amazon Web Services (AWS), you can build a scalable, cost-efficient data lake...

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.

s