Data Analytics Architecture

With a data analytics architecture in place, organizations are more likely to generate actionable insights that drive operational efficiency and business growth.

How can data analytics architecture be used to drive business results?

Big data’s promise has been that answers to every business question are somewhere in the petabytes, exabytes, and zettabytes of data sloshing around enterprise data stores.

Once people dive into the data, the theory goes, companies can base decisions on hard numbers. Machine learning and artificial intelligence would leverage enormous volumes of data to enable powerful new use cases such as making workflows more efficient or unveiling opportunities for growth.

As a result, data analytics architectures would unlock the competitive advantage of big data analytics. But first, let’s take a look at data architectures based on legacy technologies.

Big data challenges: data silos, legacy data warehouses and data lakes

As enticing as big data’s promises are, the realities have fallen short. Established enterprises have data architectures comprising multiple generations of legacy technologies. These disparate repositories use inconsistent data structures and metadata. Organizational silos lead to data silos, making exploration and data collection difficult at best.

Centralized systems like data warehouses and legacy data lakes promised a solution. They could provide the consistent single source of truth data scientists and other users depend on. Yet the legacy systems never went away.

Data management in these not-quite-monolithic architectures is complicated and expensive. Every data analysis request requires custom data pipelines. Any attempt to create new dashboard interfaces or data visualizations must compete for scarce engineering resources.

startup’s data architecture faces different challenges. From data integration to scalability, an under-resourced startup doesn’t have the experience or time to build things right.

No matter the size of the enterprise, big data analysis remains difficult. Extended time to insight continues to prevent companies from realizing their full potential.

Big data benefits: operational efficiency, customer experiences, competitive advantage

Meanwhile, corporate decision-making informed by data-driven insights makes businesses more efficient, enhances customer experiences, drives growth, and accelerates them past the competition.

Operational efficiency: Every aspect of a company’s operations generates streams of real-time data — and this isn’t limited to IT networks or websites. Industrial Internet of Things (IIoT) sensors continuously monitor manufacturing processes, environmental conditions, supply chains, and more. Harnessing this data through analytics lets companies optimize their operations like never before.

Customer experiences: Breaking down the silos separating websites, service organizations, and sales teams lets companies understand how they interact with their customers. This knowledge translates into more personalized interactions that increase engagement and customer satisfaction.

Business growth: Exploring the interactions between multiple datasets can reveal patterns the business can leverage to drive revenue growth. For example, pharmaceutical companies can use artificial intelligence and predictive algorithms to identify successful drugs earlier in the development process.

Competitiveness: A business that more effectively extracts value from its data can make better decisions faster. Its operations become more efficient, customers have better experiences, and new opportunities appear everywhere. This agile decision-making will push the company further and further ahead of its competitors.

Key components of a data analytics architecture

A data analytics architecture — one that’s deliberately thought through and well-executed — can sweep these challenges aside. Enterprises use their business strategies to determine what insights they need to make data-driven decisions. The data analytics architecture describes the who, what, how, and why of the analytics process.

For example, if empowering individual employees matters, then employees can’t depend on data engineers for every query. A data architecture would define the types of analytics tools, from dashboards to SQL-powered software, different categories of decisions will require.

A clearly-defined data analytics architecture(data storage, data ingestion, data analysis) founded upon enterprise business strategy informs the broader data architecture which, in turn, shapes information infrastructure. How the company stores, handles, and uses its vast data stores will more closely align with strategy over time.

1. Data storage

Where companies store data plays a critical role in data analytics since having data closer to the user speeds retrieval and analysis. This is the reasoning that led companies to replace disparate relational databases with data warehouses and then legacy data lake platforms.

These systems hoovered vast amounts of data into a central location for processing and access by data users — with data engineers’ help. As mentioned earlier, these monolithic systems didn’t replace every legacy system and were costly to maintain.

A modern data lake is the center of gravity of a Starburst-enabled data analytics architecture. With Starburst’s single point of access, powered by connections to over 50 enterprise data sources, most data can remain at the source. Data teams still consolidate the company’s most important data in the data lake, but they no longer worry about capturing every data point that might be important.

Starburst’s data lake analytics platform makes it easier to manage data lakes while ensuring variety, quality, accuracy, and freshness.

2. Data Ingesting, Processing and Transformation

Supporting business analytics from monolithic data storage platforms requires a massive commitment to data pipeline development — and maintenance. Engineers must develop extract, transform, and load (ETL) pipelines for every request. They must also vigilantly monitor data sources to ensure changes do not break these pipelines.

A modern data analytics architecture creates an abstraction layer that virtualizes the company’s data architecture. Authorized users can explore data at any source through a single point of access like Starburst without needing ETL pipelines.

Many projects that once required data engineering time and resources will never need a data pipeline. Large projects may still need pipelines, but with significantly reduced development times thanks to this ETL-free exploration phase.

3. Data Analysis and Exploration

Legacy data warehouses and data lakes suffer from poor data visibility. Data is hard to find. Its structure, format, and quality is inconsistent from source to source. As a result, decision-makers must wait for data teams to cleanse and process data before analysts can get to work.

Starburst’s data lake analytics platform renders the complexity of modern data architectures invisible to data consumers and the engineers who support them. Data workloads can connect to any storage layer, file format, or table format it needs. At the same time, Starburst delivers the visibility and control needed by best-in-class data governance practices.

Data analytics architecture: Real-world use case and healthcare case study

The healthcare organization Optum is a case study in using Starburst’s data lake analytics platform to create a modern data architecture.

1. Data sources

Optum’s infrastructure consists of many SAS, Microsoft SQL Server, Teradata, and Postgres databases as well as a petabyte-scale Hadoop data lake.

Starburst connectors let Optum create a virtual data architecture that unified every data source within a single system. Data no longer needs to be moved or copied across silos to support the company’s analytics.

In addition, Starburst’s separation of storage from compute allows Optum to scale resources with demand, resulting in a 30% drop in resource utilization.

2. Data processing

When analysts needed data from multiple sources, Optum’s data team had to develop ETL pipelines to copy and process data. This expensive and time-consuming operation was inflexible and too unresponsive for the 10,000 users who need results in seconds.

With Starburst, data users can directly query any data source using the SQL tools they already know. No pipelines needed. Optum’s analysts get the results from ad hoc queries up to ten times faster than they did before, speeding their time to insight.

3. Real-time results while protecting PHI

As a healthcare company, Optum has a mission to protect the personal health information (PHI) in its systems. At the same time, the company must make data accessible to produce the insights that improve patient outcomes and business performance.

By providing a single point of access, Starburst gives Optum’s authorized users the access they need to make a difference. According to Optum, customer retention and satisfaction metrics have improved and it expects faster time to insight to save millions.

At the same time that Starburst helps Optum deliver results, our virtualized data layer helps protect Optum’s PHI. Starburst Enterprise provides a central hub for managing access. Fine-grained, role-based access policies can control access by table, column, and row to ensure authorized users are the only ones able to access sensitive data.