Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Use the input above to search.
Here are some suggestions:
Trino Summit is a two-day virtual conference on the 11th and 12th of December 2024. It's an event that brings together engineers, analysts, data scientists, and anyone interested in using or contributing to Trino.
Learn moreACID transactions are methods for ensuring database integrity.
AI data analytics is the application of artificial intelligence and machine learning technologies to traditional analytics.
Both data strategy and AI strategy are integral to an organization’s success in the modern technological landscape, and yet they serve distinct purpos...
Anti-money laundering or AML consists of the regulations and practices used to prevent the abuse of the financial system in support of terrorism and o...
Apache Airflow is an open-source data workflow management framework based on Python that makes pipelines more dynamic, extensible, and scalable than t...
Apache Hadoop or Hadoop Distributed File System (HDFS) is an open-source framework for distributed storage and processing of large datasets
Apache Hive is a fault-tolerant data warehouse system built on top of Hadoop’s distributed storage architecture and is used to enable analytics at sca...
Apache Hudi (pronounced “hoodie”) is a transactional data lake platform first developed by Uber to bring data warehouse-like analytics capabilities to...
Apache Iceberg or Iceberg is an open-source table format that adds data warehouse-level capabilities to a traditional data lake.
Impala is an SQL query engine for Hadoop-based data architectures.
The Apache Parquet file format is a way to bring columnar storage to Hadoop-based data lakes. Parquet supports efficient compression and encoding sche...
Apache Spark is an analytics engine built for processing massive datasets. Spark’s ability to process vast quantities of data within Apache’s big data...
Attribute-based access control(ABAC) is a method for dynamically applying access policies based on specific attributes of the user, the data or system...
Although businesses have always crunched the numbers, “business analytics” refers to a more rigorous approach that applies statistical analysis and ot...
Centralized data is the long-established practice of gathering all data the company generates into an enterprise database, a data warehouse, or, more...
Change data capture (CDC) is the process of identifying incremental changes to source systems and transmitting those changes in real time to a target...
Cloud data covers any data stored or processed on internet-accessible remote servers, whether company-owned or hosted by third-party cloud services.
A cloud data lakehouse is a data platform that unifies enterprise data sources within a performant, cost-effective cloud architecture.
Cloud data migration is the process that moves data from legacy systems to cloud platforms.
A cloud data warehouse is a cloud-based version of the traditional on-premises enterprise data warehouse. Given the large amounts of data businesses g...
A cloud-native approach to software development takes full advantage of the cloud’s scalability, elasticity, resiliency, and efficiency.
Cloud computing makes extensive use of object storage. This has many advantages, including cost, speed, and scalability.
A compute engine, also called a query engine or an execution engine, is a component of a data processing platform. For example, Trino interprets ANSI-...
Customer 360 is a strategic priority that requires the entire organization to create unified, end-to-end customer experiences. However, harnessing all...
Dark data is the dormant contents of a company’s data lakes and other repositories.
Data analytics is the process that converts raw data into actionable insights. In data-driven organizations, analytics increasingly relies on large da...
A data analytics architecture is a set of policies and standards that guides the organization as it builds analytical processes. More than technical o...
A data application (or data app) processes and analyzes big data to rapidly deliver insights or take autonomous action.
Data architecture is a framework that guides how to collect, store, manage, and use data in ways that support an organization’s business goals.
Data blending is the process of combining data sets from different data sources to generate actionable insights that answer specific business question...
Data catalogs are data source inventories. They collect metadata about the source’s various assets.
Data classification is a framework for organizing data in ways that improve data management, information security, and risk management.
Data complexity is an emergent property of enterprise data shaped by volume, velocity, variety, veracity, value, and vigilance — the V’s of big data.
Data compliance consists of the governance processes for meeting the requirements of internal, industry, and regulatory standards for data security an...
Data democratization is the goal for organizations and employees to quickly and securely access data so that they can analyze it and make data-driven...
Data discovery is a technique for gathering data, evaluating it for potential insights, and performing advanced analytics to create actionable insight...
Data engineering emerged as a specialization of software engineering in response to exploding data volumes.
Data exploration is an essential preliminary step to analyzing large datasets. Analysts use visualization and statistical methods to understand the qu...
A data fabric is a data management architecture that uses artificial intelligence and machine learning algorithms to automate data ingestion best prac...
Data federation or federated data, involves the creation of a virtual database that maps an enterprise’s many different sources and makes them accessi...
A data governance framework is a concept within the discipline of data management that takes a holistic approach to an organization’s data and its lif...
Ingestion lands raw data from external sources into a central repository. From there, integration pipelines will transform data to meet data quality,...
Data integration is a series of data management procedures for bringing datasets from different sources into data lakes, data warehouses, or other dat...
A Data Lake Storage houses a wide variety of data types, including structured, semi-structured, and unstructured data. Each of these data types serves...
A data lakehouse, combines a data lake and a data warehouse, creating a centralized data repository, that uses cost-effective data storage, usually in...
Data lineage refers to the process and tools used to track the origin, movement, characteristics, and transformations of data as it flows through the...
A data mart is a repository of data curated to support the needs of a specific department, line of business, or business function.
Data Mesh – an approach founded by Zhamak Dehghani – refers to a decentralized, distributed approach to enterprise data management. It is a holistic c...
Data modernization is the process of moving data from the legacy systems of a fragmented, siloed infrastructure to an interconnected ecosystem of mode...
Data observability is the set of practices that help organizations understand data health and performance across the enterprise.
A data pipeline moves data from raw state to another location by executing a series of processing steps. This allows the data to be used by data consu...
A data platform is a technology stack or single solution for managing enterprise data. This system ingests and prepares data at scale for operational...
Data preparation is the process that turns raw data from disparate internal and external sources into usable datasets.
Data privacy comprises the rights of consumers to control when and how organizations may collect and use their personally identifiable information (PI...
Data products are curated collections of datasets and business-approved metadata designed to solve specific, targeted questions.
Data quality is the state of the data, reflected in its accuracy, completeness, reliability, relevance, and timeliness.
A data security strategy protects digital information from the consequences of human error, unauthorized access, and cyberattacks. These consequences...
Data sharing gives multiple users or applications simultaneous, consistent, and high-fidelity access to the same datasets.
Data silos are partially or wholly inaccessible data sets that result from a combination of technical and cultural forces. Proprietary databases and l...
Data sovereignty is a legal concept defining jurisdiction over data. Specifically, sovereignty establishes the principle that any data collected or st...
A data swamp is the inevitable outcome of a company’s misunderstanding of how data lakes work. Without a clear and well-supported big data strategy, l...
Data transformation is the process of converting and cleaning raw data from one data source to meet the requirements of its new location. Also called...
Data virtualization is a solution that creates intermediate layers between data consumers and disparate data source systems. These systems give consum...
A data warehouse is a central repository for structured enterprise data. These systems ingest raw data from various data sources through extract, tran...
A data warehouse architecture refers to how data gets loaded from source systems into data warehouses and how it is accessed by data consumers. In the...
A database is a large collection of data organized, for rapid search and retrieval by a computer.
Database Management System (DBMS) is used to manage a database and enables users to create, read, update, delete, and secure data within a database.
Decentralized data architectures decouple the operational plane — where and how data is stored — from the analytical plane — how the business uses dat...
A Delta Lake is an open-source data platform architecture that addresses the weaknesses of data warehouses and data lakes in modern big data analytics...
Distributed data is a practice that stores data where it lives, empowering business analysis through a single point of access.
ETL pipelines are automated data migration techniques for the ingestion of data from various sources into a target system.
Fault tolerance is the degree to which failures in a subsystem do not cause the overall system to stop operating. In the context of enterprise analyti...
Financial analytics is the application of big data analytics techniques to support data-driven decision-making, improve financial risk management, ens...
Apache Hadoop clusters let companies manage big data processing on commodity hardware. This distributed computing model provided a more cost-effective...
The Hadoop Distributed File System (HDFS) is a scalable, open-source file system designed to run on commodity hardware while managing the large amount...
The Apache Hadoop Ecosystem is a collection of open-source software projects designed to work with Hadoop distributed data processing platforms.
Healthcare analytics is the application of advanced data analytics solutions to the healthcare industry’s unique requirements. Unifying large volumes...
Hybrid cloud is an architecture that manages storage, networking, and compute resources across different environments. This structure may include on-p...
Hypothesis-driven development (HDD), also known as hypothesis-driven product development, is an approach used in software development and product mana...
According to the National Institute of Standards and Technology (NIST), incident response is the reaction to violations of computer security policies...
Massively parallel processing is an architecture for distributing workloads across hundreds or thousands of separate processors. Although parallel com...
A multi-cloud infrastructure uses cloud services from one or more vendors.
Object storage is an alternative to traditional file systems for storing large amounts of unstructured data in scalable, cost-efficient, and performan...
Online analytical processing (OLAP) systems are data analysis platforms that centralize large amounts of data from disparate sources.
An open data lakehouse is a data analytics architecture that combines a data lake’s cost-effective storage with a data warehouse’s robust analytics.
An open data warehouse is an open source alternative to monolithic, proprietary applications like Teradata or Snowflake.
An open file format is a specification for the way data gets written to storage.
Open table formats are designed to provide enhanced performance and compliance capabilities for data lakes using cloud-based object storage.
PostgreSQL is an open-source relational database management system (RDBMS) with a rich feature set, reliability, and performance that competes with a...
Presto SQL query engine (formerly PrestoDB) and Trino (formerly PrestoSQL) are both SQL query engines. They are both designed for high-performance SQL...
Query acceleration is a set of techniques for minimizing data processing workloads when analyzing a large amount of data.
A query engine takes a request for data, translates it from human to machine language, and then fulfills the request by retrieving specific data.
Real-time analytics is the ingesting, processing, and analyzing of the output from real-time data sources such as Internet of Things (IoT) sensors or...
Reference data categorizes information and defines the ranges of permissible values to ensure consistency in use across business processes and between...
Risk management is the process of identifying, assessing, analyzing, prioritizing, mitigating, controlling, and monitoring potential exposures to busi...
Role-based access control is a system of fine-grained access privileges granted to authorized users to perform a defined set of tasks.
Schema discovery is a data engineering practice for finding and documenting the structure of data sources within a repository, such as a relational da...
Schema-on-read approaches only apply a schema when a query accesses a table. Any required transformations happen at runtime.
A data lake is a centralized repository for large volumes of raw data from multiple sources that simplifies big data analytics and optimizes data infr...
A semantic layer is an interface sitting between data consumers and enterprise data sources, abstracting the underlying data architecture.
A single source of truth (SSOT) is a centralized location of master data for an organization’s decision-making processes. Theoretically, a data wareho...
Star schema is a popular architecture for organizing data in the context of data warehousing. It is characterized by a central fact table that is dire...
Starburst is the data company, not the candy company. Our data lakehouse platform combines the best of data lakes, data warehouses and data virtualiza...
Streaming data is the continuous dataflow generated by transactional systems, activity logs, Internet of Things (IoT) devices, and other real-time dat...
Structured query language or SQL is a powerful language that plays a vital role in managing and analyzing data in relational databases.
Unstructured data is not conformed to any preset schema or format. Traditionally, unstructured data was rare, but this has evolved due to the rise of...
Find all of the support you need to do amazing things with Starburst.
Jump start your learning experience.
Expand your knowledge of Starburst and its ecosystem.
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included