We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!
It is important to adopt a data-centric mindset and support it with ML operations
Artificial Intelligence (AI) is one thing in the lab; In the real world, lumbering elephants are exposed by the aggression of speeding midgets. Many AI models fail to deliver reliable results when deployed. Others make a good start, but then the results slow down, leaving their owners frustrated. Many businesses do not get the return on AI they expect. Why does the AI model fail and what is the solution?
As companies have experimented more with AI models, there have been some successes, but numerous disappointments. Dimensional Research reports that 96% of AI projects face problems in increasing data quality, data labeling and model confidence.
AI researchers and developers for business often use traditional educational methods to increase accuracy. That is, keep the model’s data consistent when tinkering with the model architectures and fine-tuning algorithms. It’s like repairing a sail when there’s a leak in the boat – that’s an improvement, but wrong. Why? Good code cannot eliminate bad data.
Instead, they should make sure that the datasets are compatible with the application. Traditional software is powered by code, while AI systems are built using both code (model + algorithms) and data. Take face recognition, for example, in which the AI-powered app was mostly trained on Caucasian faces, rather than racially diverse faces. Not surprisingly, the results were less accurate for non-Caucasian users.
Good training data is just the starting point. In the real world, AI applications are often accurate at first, but then deteriorate. When accuracy decreases, many teams respond by tuning software code. It doesn’t work because the underlying problem was changing real world conditions. Answer: To increase reliability, modify the data instead of algorithms.
Since AI failure is usually related to data quality and data drifts, practitioners can use a data-focused approach to keep the AI application healthy. Data is like food for AI. In your application, the data must be a first-class citizen. It is not enough to support this idea; Organizations need “infrastructure” to keep up with the right data.
MLops: The “How” of Data-Focused AI
Continuous good data requires ongoing processes and practices known as MLops for machine learning (ML) operations. MLops’ main mission: to provide high-quality data as it is required for a data-centric AI approach.
MLops address the specific challenges of data-centric AI, which is complex enough to ensure stable employment for data scientists. Here is a sample:
- Incorrect amount of data: Noisy data can distort small datasets, while large amounts of data can make labeling difficult. Both issues throw the model. The exact size of the dataset for your AI model depends on the problem you are addressing.
- People outside the data: A common defect in the data used to train AI applications, outliers may slash results.
- Insufficient data range: This can lead to inability to handle outsiders properly in the real world.
- Data drift: Which often reduces the accuracy of the model over time.
These issues are serious. A Google survey of 53 AI practitioners found that “data cascade-event issues that lead to negative, downstream effects from data issues combine – starting with traditional AI / ML practices that underestimate data quality … comprehensive (92% coverage). ), Invisible, delayed, but often avoidable. “
How do MLOps work?
Before using the AI model, researchers need to plan to maintain its accuracy with new data. Key steps:
- Perform audit and inspection of model projections to ensure consistent results
- Monitor the health of the data that powers the model; Make sure there are no increments, missing values, duplicates or discrepancies in the distribution.
- Ensure the system complies with privacy and consent rules
- When the accuracy of the model decreases, find the cause
To practice good MLops and develop AI responsibly, here are some questions to address:
- How do you catch data drifts in your pipeline? Data drift can be more difficult to capture than data quality defects. Subtle visible data changes can have a big impact on certain model predictions and specific customers.
- Does your system reliably move data from point A to B without compromising data quality? Fortunately, moving bulk data from one system to another has become much easier, as the tools for ML have improved.
- Can you automatically track and analyze data with alerts when data quality issues arise?
MLops: How to get started now
You may be wondering, how do we prepare to solve these problems? Building MLops capabilities can begin politely with a data expert and your AI developer. As a discipline of the early days, MLops are evolving. There is no gold standard or approved framework yet to define a good MLops system or organization, but here are some basics:
- In developing models, AI researchers need to consider data on every step from product development to deployment and post-deployment. The ML community needs mature MLops tools to help build high-quality, reliable and representative datasets to power the AI system.
- Post-deployment maintenance of AI applications cannot be the next thought. Production systems should implement ML-equivalents of Devops’ best practices, including logging, monitoring, and CI / CD pipelines, which are responsible for data generation, data drifts, and data quality.
- Create ongoing collaboration between all stakeholders, from executive leadership to subject matter experts, ML / data scientists, ML engineers and SREs.
Continued success for AI / ML applications demands a consistent focus on data from “get the code right and you’re done”. Systematically improving data quality for the basic model is better than chasing the advanced model with low-quality data.
Not yet defined science, MLops include practices that make data-centric AI efficient. We will learn a lot about what works most effectively in the years to come. In the meantime, you and your AI team can actively – and creatively – create an MLops framework and tune it with your models and applications.
Alessya Visnijc is the CEO of WhyLabs
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including tech people working on data, can share data-related insights and innovations.
If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing to your own article!
Read more from DataDecisionMakers