Operationalizing data products at scale with AI
Adrian Estala
VP, Field Chief Data Officer
Starburst
Adrian Estala
VP, Field Chief Data Officer
Starburst
Share
More deployment options
AI capabilities are at the top of every data strategy and referenced in every executive boardroom as a strategic business imperative. AI, in its broadest form, has been around since the 1950s and has been a specialized data capability in many industries for a long time. We have been using AI across all industries. Yet, just a couple of years ago, AI was not at the top of most executive data agendas. How did AI suddenly become the dominant topic in digital transformation?
Data engineers and scientists point to some big shifts coming together at the right time. Rapid advancements in GPU and neural network usage to leverage massive data assets for model training have fueled the exponential growth of AI innovation.
My view is that the biggest factor in the recent popularity of AI is the perception that it is easy to access for anyone. When ChatGPT became publicly available, it became as real as that self-driving car, except there was no wait-list or prohibitive cost. Everyone, everywhere was instantly empowered to ask their own questions and seek their own insight. To be proficient in using ChatGPT, you didn’t need two PhDs or know how to talk to a REST interface, you just had to learn how to ask the question. ChatGPT users quickly shifted from using a traditional search engine to prompting an answer engine.
I see the same rapid shift when teams begin to use data products. The consumers stop worrying about searching and waiting for data and they start focusing on using the data to build business solutions and insight. Data products drive the perception that working with data doesn’t have to be difficult. Consumers feel that ChatGPT moment, where using data products is so easy that you forget how hard finding the right data used to be. Implemented correctly, AI and data products can work together to accelerate adoption and improve ease of use.
This article explores how AI and data products are coming together to improve the user experience and to accelerate the delivery of innovative business solutions. Some of these are based on conceptual discussions, and others are based on actual use cases. Early strategic vision is usually painted with some crayons (curiosity and imagination) and watercolors (experience and detail).
Data Products For Dummies
See the future of data products with large language models
Viewing data products with a business lens
Before we get into some examples, I want to first position the business perspective. When we talk about AI, our business customers immediately think of automation and rapid answers. The how isn’t as important, it is all about the what and when. What can we do right now to give us a competitive advantage, or, what is the market doing that we are not. The business will view data products based on the solutions that they are enabling, and most of them may never see a raw data product – all they see is the final insight. When we talk about AI driven data products, your business teams are going to hear “faster data solutions and better insight.” At its core, access to more data drives AI innovation.
This is where Starburst emerges as a transformative force, reshaping the landscape of data product management with a distinct business-oriented perspective, and fueling the exponential growth of AI adoption.
The image below demonstrates how Starburst is being used to accelerate analytics across leading data lake solutions while also enabling performant federation across other cross-cloud or on-prem data sources. For data science teams training new models, immediate access to the data and the power of the lakehouse are the game changers. For the business teams, they can focus on inventing new AI driven business solutions. Starburst will abstract that back-end data architecture complexity.
Streamline data product design
Envision a future where data products are automatically created and recommended to consumers based on enterprise and industry trends for their unique profile. To train the engines, we provide data on data usage trends, we provide ontologies, we provide user profiles, and we never stop training. The AI engine develops the products and automates the documentation. AI will pull together metadata, fill in the gaps and generate a data product description that is customized for each consumer. These pre-built ‘answers’ will accelerate the development of new ideas, new questions and new business solutions. Data access will be dynamically defined based on a set of attributes that create an accurate, instant risk profile.
AI can be used to analyze the historical performance, risk and user experience with different design patterns and automatically select the ideal design for specific product types. AI can also help to create a personalized user experience, by predicting which features a user will find most useful and customizing the design accordingly.
What we are seeing today:
- AI driven knowledge graphs are very useful for exploring new data product opportunities. Even if you can just pull all the metadata together into a data catalog, it provides a great accelerator.
- A non-technical user can use natural language prompts in ChatGPT to generate a query for requested data sets. That query pulls the data together for a data product.
- Attribute-based access controls are being applied to data products. As these use cases mature, we can expect to see AI playing a stronger role.
Optimize data product operations
I believe that data product operations will see the fastest evolution with AI, because the potential value is incredible. In the near future, AI will be used to automate data product operations from end to end. AI will manage data quality in real-time, checking for errors, inconsistencies, or anomalies. AI will be used to identify and mitigate potential security risks, calculating dynamic risk profiles as data sets are continuously aggregated.
Predictive maintenance algorithms will identify and correct operational bottlenecks and potential failure points before they result in downtime. If data changes on the back end, it will be detected and the data product will be automatically adjusted to ensure consistency. If data is suddenly unavailable, an algorithm could use trends to predict the missing data sets and ensure continuity of the front end solution.
What we are seeing today:
- AI is already being used across IT Operations, from support chat bots to cyber security to predicting failures.
- AI driven ‘virtual-engineers’ are being used to review query performance across a global enterprise.
- AI is being used to identify and correct data quality, to validate data input and to highlight duplicate records.
- AI is being used with data observability tools to improve data lineage and overall data health.
Accelerate data product consumption
AI will materially simplify and enhance the user experience in the future. AI will provide personalized insights based on each consumer’s profile. As the consumer continues to reuse and create new data products, the AI engine will learn what types of data products to recommend and how to design them. Consumers will interact with data in more intuitive ways to accelerate ideation and new insight. AI can also automate the generation of reports and dashboards, reducing the need for manual analysis.
What we are seeing today:
- AI visualizations are already facilitating active exploration of data sets and this is improving rapidly.
- Data products are being used to accelerate AI models, enabling data scientists to quickly find and reuse data sets that are key to new models.
- Natural language questions are being used to help with data product exploration and integration.
- AI driven chatbots are being used to provide consumer support, helping to address data product questions and tickets
Challenges
This article presents an optimistic, ambitious view for how AI and data products could work together in the future. I want to paint a picture for the art of the possible, and ground it with some of the real advances that we are already seeing. We should also recognize that neither AI or data products will fully succeed until we fix the challenges around data quality, compliance and business focused data ontologies, to name a few. My advice for teams that are focused on these challenges is to pause and reset your approach. Every data governance leader should be managing a strategy based on how data will be consumed in the future (e.g. data products) and how that data will be transformed (e.g. AI).
Getting Started
Data products are being used in every industry and being used to serve many different types of solutions. If you are just getting started, the best advice is to start with a small team and a handful of data products. It is important to get the IT teams and maybe one small business team good at using data products within your environment, before you expand. You can add automation and additional features in the future phases, the initial goal is to learn how data products fit into your current ecosystem (process, people, technology). I would advise against building a big platform or migrating any data sets, this is just going to add unnecessary cost, delay and complexity to your initial MVP. Transformation initiatives need early wins to build momentum and confidence to overcome the bigger challenges.
Operationalizing data products at scale with AI
The impact of AI on data products