Data Glossary

ACID Transactions #

ACID transactions are methods for ensuring database integrity.

AI Analytics #

AI data analytics is the application of artificial intelligence and machine learning technologies to traditional analytics.

AI Data Strategy #

Both data strategy and AI strategy are integral to an organization’s success in the modern technological landscape, and yet they serve distinct purpos... Learn More ›

Anti-Money Laundering #

Anti-money laundering consists of the regulations and practices used to prevent the abuse of the financial system in support of terrorism and other cr... Learn More ›

Apache Airflow #

Apache Airflow is a widely adopted orchestration engine that allows you to schedule and run complex data pipelines. Airflow provides many plug-and-pla... Learn More ›

Apache Hadoop #

Apache Hadoop is an open-source framework for distributed storage and processing of large datasets

Apache Hive #

Apache Hive is a data warehouse system built on top of Hadoop’s distributed storage architecture.

Apache Hudi #

Apache Hudi (pronounced “hoodie”) is a transactional data lake platform first developed by Uber to bring data warehouse-like analytics capabilities to... Learn More ›

Apache Iceberg #

Apache Iceberg is an open-source table format that adds data warehouse-level capabilities to a traditional data lake.

Apache Impala #

Impala is an SQL query engine for Hadoop-based data architectures.

Apache Parquet #

The Apache Parquet file format is a way to bring columnar storage to Hadoop-based data lakes. Parquet supports efficient compression and encoding sche... Learn More ›

Apache Spark #

Apache Spark is an analytics engine built for processing massive datasets. Spark’s ability to process vast quantities of data within Apache’s big data... Learn More ›

Attribute-Based Access Control (ABAC) #

Attribute-based access control(ABAC) is a method for dynamically applying access policies based on specific attributes of the user, the data or system... Learn More ›

Business Analytics #

Although businesses have always crunched the numbers, “business analytics” refers to a more rigorous approach that applies statistical analysis and ot... Learn More ›

Centralized Data #

Centralized data is the long-established practice of gathering all data the company generates into an enterprise database, a data warehouse, or, more... Learn More ›

Cloud Data #

Cloud data covers any data stored or processed on internet-accessible remote servers, whether company-owned or hosted by third-party cloud services.

Cloud Data Migration #

Cloud data migration is the process that moves data from legacy systems to cloud platforms.

Cloud Data Warehouse #

A cloud data warehouse is a cloud-based version of the traditional on-premises enterprise data warehouse. Given the large amounts of data businesses g... Learn More ›

Cloud Native #

A cloud-native approach to software development takes full advantage of the cloud’s scalability, elasticity, resiliency, and efficiency.

Cloud Object Storage #

Cloud computing makes extensive use of object storage. This has many advantages, including cost, speed, and scalability.

Customer Data Platform #

Customer 360 is a strategic priority that requires the entire organization to create unified, end-to-end customer experiences. However, harnessing all... Learn More ›

Dark Data #

Dark data is the dormant contents of a company’s data lakes and other repositories.

Data Analytics #

Data analytics is the process that converts raw data into actionable insights. In data-driven organizations, analytics increasingly relies on large da... Learn More ›

Data Analytics Architecture #

A data analytics architecture is a set of policies and standards that guides the organization as it builds analytical processes. More than technical o... Learn More ›

Data Applications #

A data application (or data app) processes and analyzes big data to rapidly deliver insights or take autonomous action.

Data Architecture #

Data architecture is a framework that guides how to collect, store, manage, and use data in ways that support an organization’s business goals.

Data Blending #

Data blending is the process of combining data sets from different data sources to generate actionable insights that answer specific business question... Learn More ›

Data Catalog #

Data catalogs are data source inventories. They collect metadata about the source’s various assets.

Data Classification #

Data classification is a framework for organizing data in ways that improve data management, information security, and risk management.

Data Complexity #

Data complexity is an emergent property of enterprise data shaped by volume, velocity, variety, veracity, value, and vigilance — the V’s of big data.

Data Compliance #

Data compliance consists of the governance processes for meeting the requirements of internal, industry, and regulatory standards for data security an... Learn More ›

Data Democratization #

Data democratization is the goal for organizations and employees to quickly and securely access data so that they can analyze it and make data-driven... Learn More ›

Data Discovery #

Data discovery is a technique for gathering data, evaluating it for potential insights, and performing advanced analytics to create actionable insight... Learn More ›

Data Engineering #

Data engineering emerged as a specialization of software engineering in response to exploding data volumes.

Data Exploration #

Data exploration is an essential preliminary step to analyzing large datasets. Analysts use visualization and statistical methods to understand the qu... Learn More ›

Data Fabric #

A data fabric is a data management architecture that uses artificial intelligence and machine learning algorithms to automate data ingestion best prac... Learn More ›

Data Federation #

Data federation involves the creation of a virtual database that maps an enterprise’s many different sources and makes them accessible through a singl... Learn More ›

Data Governance #

Data governance is a concept within the discipline of data management that takes a holistic approach to an organization’s data and its lifecycle: data... Learn More ›

Data Integration #

Data integration is a series of data management procedures for bringing datasets from different sources into data lakes, data warehouses, or other dat... Learn More ›

Data Lake #

A data lake is a single store of data that can include structured data from relational databases, semi-structured data and unstructured data.

Data Lake Storage #

A Data Lake Storage houses a wide variety of data types, including structured, semi-structured, and unstructured data. Each of these data types serves... Learn More ›

Data Lakehouse #

Combining data lakes and data warehouses, a data lakehouse is a centralized data repository, that uses cost-effective data storage, usually in the clo... Learn More ›

Data Lineage #

Data lineage refers to the process and tools used to track the origin, movement, characteristics, and transformations of data as it flows through the... Learn More ›

Data Mart #

A data mart is a repository of data curated to support the needs of a specific department, line of business, or business function.

Data Mesh #

Data Mesh – an approach founded by Zhamak Dehghani – refers to a decentralized, distributed approach to enterprise data management. It is a holistic c... Learn More ›

Data Modernization #

Data modernization is the process of moving data from the legacy systems of a fragmented, siloed infrastructure to an interconnected ecosystem of mode... Learn More ›

Data Observability #

Data observability is the set of practices that help organizations understand data health and performance across the enterprise.

Data Pipeline #

A data pipeline moves data from raw state to another location by executing a series of processing steps. This allows the data to be used by data consu... Learn More ›

Data Platform #

A data platform is a technology stack or single solution for managing enterprise data. This system ingests and prepares data at scale for operational... Learn More ›

Data preparation #

Data preparation is the process that turns raw data from disparate internal and external sources into usable datasets.

Data Privacy #

Data privacy comprises the rights of consumers to control when and how organizations may collect and use their personally identifiable information (PI... Learn More ›

Data Products #

Data products are curated collections of datasets and business-approved metadata designed to solve specific, targeted questions.

Data Quality #

Data quality is the state of the data, reflected in its accuracy, completeness, reliability, relevance, and timeliness.

Data Security #

A data security strategy protects digital information from the consequences of human error, unauthorized access, and cyberattacks. These consequences... Learn More ›

Data Sharing #

Data sharing gives multiple users or applications simultaneous, consistent, and high-fidelity access to the same datasets.

Data Silos #

Data silos are partially or wholly inaccessible data sets that result from a combination of technical and cultural forces. Proprietary databases and l... Learn More ›

Data Sovereignty #

Data sovereignty is a legal concept defining jurisdiction over data. Specifically, sovereignty establishes the principle that any data collected or st... Learn More ›

Data Swamp #

A data swamp is the inevitable outcome of a company’s misunderstanding of how data lakes work. Without a clear and well-supported big data strategy, l... Learn More ›

Data Transformation #

Data transformation is the process of converting and cleaning raw data from one data source to meet the requirements of its new location. Also called... Learn More ›

Data Virtualization #

Data virtualization is a solution that creates intermediate layers between data consumers and disparate data source systems. These systems give consum... Learn More ›

Data Warehouse #

A data warehouse is a central repository for structured enterprise data. These systems ingest raw data from various data sources through extract, tran... Learn More ›

Data Warehouse Architecture #

A data warehouse architecture refers to how data gets loaded from source systems into data warehouses and how it is accessed by data consumers. In the... Learn More ›

Database #

A database is a large collection of data organized, for rapid search and retrieval by a computer.

Database Management System #

Database Management System (DBMS) is used to manage a database and enables users to create, read, update, delete, and secure data within a database.

Decentralized Data #

Decentralized data architectures decouple the operational plane — where and how data is stored — from the analytical plane — how the business uses dat... Learn More ›

Delta Lake #

A Delta Lake is an open-source data platform architecture that addresses the weaknesses of data warehouses and data lakes in modern big data analytics... Learn More ›

Distributed Data #

Distributed data is a practice that stores data where it lives, empowering business analysis through a single point of access.

Extract, Transform, Load (ETL) #

ETL pipelines are automated data migration techniques for the ingestion of data from various sources into a target system.

Fault Tolerance #

Fault tolerance is the degree to which failures in a subsystem do not cause the overall system to stop operating. In the context of enterprise analyti... Learn More ›

Hybrid Cloud #

Hybrid cloud is an architecture that manages storage, networking, and compute resources across different environments. This structure may include on-p... Learn More ›

Hypothesis-Driven Development #

Hypothesis-driven development (HDD), also known as hypothesis-driven product development, is an approach used in software development and product mana... Learn More ›

Incident Response #

According to the National Institute of Standards and Technology (NIST), incident response is the reaction to violations of computer security policies... Learn More ›

Massively Parallel Processing #

Massively parallel processing is an architecture for distributing workloads across hundreds or thousands of separate processors. Although parallel com... Learn More ›

Multi-cloud #

A multi-cloud infrastructure uses cloud services from one or more vendors.

Object Storage #

Object storage is an alternative to traditional file systems for storing large amounts of unstructured data in scalable, cost-efficient, and performan... Learn More ›

Open Data Lakehouse #

An open data lakehouse is a data analytics architecture that combines a data lake’s cost-effective storage with a data warehouse’s robust analytics.

Open Data Warehouse #

An open data warehouse is an open source alternative to monolithic, proprietary applications like Teradata or Snowflake.

Open File Formats #

An open file format is a specification for the way data gets written to storage.

Open Table Formats #

Open table formats are designed to provide enhanced performance and compliance capabilities for data lakes using cloud-based object storage.

PostgreSQL #

PostgreSQL is an open-source relational database management system (RDBMS) with a rich feature set, reliability, and performance that competes with a... Learn More ›

Presto #

Presto SQL query engine (formerly PrestoDB) and Trino (formerly PrestoSQL) are both SQL query engines. They are both designed for high-performance SQL... Learn More ›

Query Acceleration #

Query acceleration is a set of techniques for minimizing data processing workloads when analyzing a large amount of data.

Query Engine #

A query engine takes a request for data, translates it from human to machine language, and then fulfills the request by retrieving specific data.

Risk Management #

Risk management is the process of identifying, assessing, analyzing, prioritizing, mitigating, controlling, and monitoring potential exposures to busi... Learn More ›

Role-based Access Control #

Role-based access control is a system of fine-grained access privileges granted to authorized users to perform a defined set of tasks.

Schema Discovery #

Schema discovery is a data engineering practice for finding and documenting the structure of data sources within a repository, such as a relational da... Learn More ›

Security Lake #

A data lake is a centralized repository for large volumes of raw data from multiple sources that simplifies big data analytics and optimizes data infr... Learn More ›

SQL #

SQL stands for structured query language. SQL is a powerful language that plays a vital role in managing and analyzing data in relational databases, m... Learn More ›

Star Schema #

In the context of data warehousing, the star schema is a popular architecture for organizing data. It is characterized by a central fact table that is... Learn More ›

Starburst #

Starburst is the data company, not the candy company. Our data lakehouse platform combines the best of data lakes, data warehouses and data virtualiza... Learn More ›

Streaming Data #

Streaming data is the continuous dataflow generated by transactional systems, activity logs, Internet of Things (IoT) devices, and other real-time dat... Learn More ›

Trino #

Trino is an open source distributed SQL query engine built in Java, designed to run fast analytic queries against various data sources ranging in size... Learn More ›

Unstructured Data #

Unstructured data is not conformed to any preset schema or format. Traditionally, unstructured data was rare, but this has evolved due to the rise of... Learn More ›

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

ACID Transactions #

AI Analytics #

AI Data Strategy #

Anti-Money Laundering #

Apache Airflow #

Apache Hadoop #

Apache Hive #

Apache Hudi #

Apache Iceberg #

Apache Impala #

Apache Parquet #

Apache Spark #

Attribute-Based Access Control (ABAC) #

Business Analytics #

Centralized Data #

Cloud Data #

Cloud Data Migration #

Cloud Data Warehouse #

Cloud Native #

Cloud Object Storage #

Customer Data Platform #

Dark Data #

Data Analytics #

Data Analytics Architecture #

Data Applications #

Data Architecture #

Data Blending #

Data Catalog #

Data Classification #

Data Complexity #

Data Compliance #

Data Democratization #

Data Discovery #

Data Engineering #

Data Exploration #

Data Fabric #

Data Federation #

Data Governance #

Data Integration #

Data Lake #

Data Lake Storage #

Data Lakehouse #

Data Lineage #

Data Mart #

Data Mesh #

Data Modernization #

Data Observability #

Data Pipeline #

Data Platform #

Data preparation #

Data Privacy #

Data Products #

Data Quality #

Data Security #

Data Sharing #

Data Silos #

Data Sovereignty #

Data Swamp #

Start Free with
Starburst Galaxy