Real-world insights from Asurion: Data quality in practice

Strategy
  • Megan Kemphaus

    Megan Kemphaus

    Customer Marketing Manager

    Starburst Data

Share

Why should organizations pay attention to data quality is the heart of the question. Moreover, how does big data and inconsistent data impact data-driven decision-making? For Asurion data-driven leaders, maintaining data quality is crucial, especially when the business processes over twenty billion records daily.

Asurion, a technology care company, provides protection plans for devices with a power button. If users purchase a phone from AT&T or Verizon, the protection plan is likely from Asurion. Additionally, products purchased on Amazon often come with an option to add a protection plan, which is also from Asurion.

Despite the speed and sophistication of an organization’s data platform, the business ultimately requires timeliness, accuracy and data consistency. Therefore, data quality standards lie at the core of your entire data platform’s effectiveness.

This blog will explore the key aspects of data quality, its management, and its real-world applications, with insights derived from a discussion at Data Universe

Asurion’s data leaders Sandhya Devineni and Rajesh Gundugollu provide a compelling case for a solid data quality foundation and shared their journey to Data Universe attendees in managing data quality at scale.

Here are several key insights:

Data Quality as a foundation:

According to Asurion, “Data quality is the foundation of effective analytics. Poor data quality leads to poor decisions, which can have significant impacts on the business.” High-quality data improves business outcomes, such as better customer data management and enhanced decision-making processes.

Scalability challenges:

Asurion’s data platform runs over 10,000 jobs daily and processes more than 20 billion records. They identified over 80 data quality issues in just 90 days, underscoring the need for scalable solutions. Traditional methods like data profiling and predefined business rules were insufficient, prompting Asurion to develop a machine learning framework to detect bad data more effectively.

Machine learning for data quality:

By employing advanced machine learning algorithms, Asurion automates data cleansing processes, identifies patterns of data inconsistencies, and predicts potential data quality issues before they arise. This proactive approach allows Asurion to maintain high standards of data accuracy, completeness, and reliability, ultimately leading to more robust and insightful data analytics.

Key dimensions of data quality:

  • Accuracy: Ensures data is correct and free from errors. As Asurion highlights, “Accurate data is essential for trust. When our data is accurate, our stakeholders have confidence in our decisions.”
  • Completeness: Indicates whether all required data is present. “Incomplete data can lead to incomplete insights,” says Asurion.
  • Consistency: Ensures data is uniform across different datasets. “Consistency in data means that different parts of our business are speaking the same language,” notes Asurion.
  • Timeliness: Refers to the availability of data when needed. “Timely data means we can react to changes in the market swiftly.”
  • Uniqueness: Ensures each data record is unique without duplicates. Asurion stresses, “Duplicate records can cause confusion and errors in reporting.”

Impact of machine learning on data quality:

Asurion’s machine learning initiative has significantly improved their data quality management. They reported a twofold reduction in data quality issues and a decrease in reprocessing costs due to proactive detection. This approach has also enhanced trust in their enterprise data by ensuring higher reliability and preventing defective data from reaching downstream processes.

Effective data governance:

Asurion emphasizes the importance of robust data governance in maintaining data quality. “Our data governance policies ensure that data is managed as a valuable asset,” states Asurion. By implementing comprehensive data governance frameworks, they ensure data integrity and consistency across all data assets, aligning data management with business strategies and regulatory requirements.

Metrics for data quality:

Asurion uses specific metrics to measure and monitor the quality of their data. “These metrics allow us to quantify the impact of our data quality initiatives and ensure continuous improvement,” says Asurion. Implementing these metrics helps in tracking progress and identifying areas that need attention.

Asurion’s approach to improving data quality

Data quality metrics and monitoring: Asurion has developed a robust system for measuring, monitoring, and facilitating data quality metrics. This includes identifying key data quality dimensions such as accuracy, completeness, uniqueness, timeliness, validity, and integrity.

Custom code framework: Asurion built a custom code framework for data validation. This framework allows Asurion to write SQL queries to validate data, ensuring accuracy by comparing source system values with target system values.

Data profiling tools: Asurion uses data profiling tools and open-source frameworks like AWS Deequ. These tools help Asurion in ensuring data completeness by systematically profiling and validating data across their data lakes.

Anomaly detection: Asurion implemented an anomaly detection framework. This framework uses machine learning and statistical models to identify anomalies in data volume and refresh delays, ensuring data timeliness.

Structural integrity checks: Asurion provided a solution for checking structural integrity using autoencoders. Autoencoders help detect complex data quality issues by learning the inherent relationships within the data and identifying records that deviate from these learned patterns.

Machine learning models: By leveraging advanced machine learning models, Asurion automates data cleansing processes. These models predict potential data quality issues before they arise, improving the reliability and accuracy of data.

Scalability and efficiency: Asurion’s solutions allow them to scale their data quality management processes. This included automating the detection and prevention of data quality issues, reducing reprocessing costs, and enhancing trust in enterprise data.

Proactive data quality management: Asurion’s tools enable them to shift from reactive to proactive data quality management. This approach prevents defective data from propagating to downstream processes, ensuring higher data quality and reliability.

Impact on business outcomes: Asurion achieved a twofold reduction in data quality issues and improved the overall trustworthiness of their data. This proactive management also led to cost savings and better alignment of data quality with business goals.

New data observability features in Starburst Galaxy maintains data integrity

Additionally, Starburst has introduced new data observability features in Starburst Galaxy, enhancing visibility and simplifying observability with support for column lineage, SQL-based data quality checks, and schema change monitoring. These features help data teams understand the flow and shape of their data more quickly and accurately.

Column lineage:

Column lineage captures and visualizes data flow between columns, enabling teams to carry out thorough impact analysis before making changes to data pipelines. This reduces the need for tedious code reviews and allows for quick troubleshooting of data quality issues.

SQL-based data quality checks:

This feature allows data teams to use SQL to author data quality rules, providing a flexible and comprehensive way to ensure data health. SQL-based checks can be authored for any data source, enabling effective monitoring and maintenance of data quality.

Schema change monitoring & comparison:

Schema change notifications, logs, and daily schema snapshot comparisons provide data teams with greater visibility into schema changes. This helps prevent data downtime and ensures that any changes are quickly identified and addressed.

Mitigating data quality problems is essential for any organization aiming to thrive in today’s data-centric landscape. Starburst provides the tools and frameworks necessary to tackle data quality challenges at scale, helping organizations optimize high standards of data accuracy, consistent data, and reliability. By streamlining processes, Starburst enables data-driven leaders and data stewards with good data quality to make informed business decisions to meet their key performance indicators(KPI) and impact their ROI in a substantial way.