We are excited to bring Transform 2022 back to life on 19th July and virtually 20th July – 3rd August. Join AI and data leaders for sensible conversations and exciting networking opportunities. Learn more
For a moment, imagine that you are leading a team of customer success operations responsible for compiling a weekly report for the CEO outlining data on customer brainstorming and analytics.
Often, you will only report a data problem after minutes. It doesn’t matter how strong the ETL pipelines are or how often the team reviews SQL queries – the data is simply not reliable. This often puts you in an awkward position to return to leadership and tells them that the information you just provided was incorrect. These interactions shatter the CEO’s confidence not only in the data but also in the conclusions you draw from it. Something has to change.
In today’s business landscape, many companies manage petabytes of data. Without managing this dataset health – this is a huge volume more than most people can understand – let alone manage.
Observability is a familiar concept
So how do you manage the health of such large datasets? Think about cars. A car is a complex system, and the actions you take to deal with flat tires are different from engine trouble. Fortunately, you do not have to inspect the entire vehicle whenever it breaks down. Instead, you rely on tire pressure or a check-engine light to warn you – usually in advance of serious consequences – not only if a problem exists but also which part of the car is affected. This kind of automatic surface of problems is called observability.
In software engineering, this concept exists above and below the stack. In DevOps, for example, an alert and easily accessible dashboard gives the engineer a start to troubleshoot. Companies like New Relic, DataDog and Dynatrace help software engineers quickly get to the root of problems in complex software systems. This is Infrastructure observability, On top of the stack, in the AI and machine learning model layers, other companies provide machine learning engineers with observations on how their product models work in a constantly changing environment. This is Machine learning observability,
So what does infrastructure do for observability software and what does machine learning do for machine learning models, Data observability Dataset does for health management. This discipline works in concert, and often you have to rely on more than one of them to solve the problem.
What is data observability?
Data observability is the discipline of automatically surfacing the health of your data and repairing any problems as quickly as possible.
It’s a fast-growing area with key players like Monte Carlo and Bige, as well as upstarts like AxelData, Databand and Soda. The software infrastructure observability market, which is more mature than the data observability market, was estimated to be worth more than $ 5 billion in 2020 and has grown significantly since then. While the data observability market is not so developed at this point, it has plenty of room for growth as it caters to different individuals (data engineers versus software engineers) and solves various problems (datasets versus web applications). Overall, companies focusing on data observability have collectively raised more than $ 250 million so far.
Why ventures need to be taken care of
Today every company is a data company. This can take many forms, ranging from a technology company to a third-party provider to a large financial decision-maker based on data collected by user data to better recommend content to a manufacturing company that maintains large internal datasets on security systems. Today’s technological trends, ranging from digital transformation to the shift to cloud computing and data storage, only serve to expand this influence of data.
Given the heavy reliance of organizations on data, any problem with that data can go deep into the enterprise, affecting customer service, marketing, operations, sales and ultimately revenue. When data powers automated systems or mission-critical decisions, the stakes can rise.
If the data is new oil, it is important to monitor and maintain the integrity of this precious resource. Just as most of us do not tap into a check-engine light, we need to look at data observation practices with infrastructure and AI observability for businesses that rely heavily on those areas.
As datasets get bigger and data systems become more complex, data observability will continue to be an important tool for realizing maximum business value and sustainability.
Aparna Dhinakaran is a co-founder and CPO of Machine Learning Observability Provider. Arize AIShe was recently named in 2022 Forbes 30 Under 30 in Enterprise Technology and is a member of the Cognitive World Think Tank on Enterprise AI.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including tech people working on data, can share data-related insights and innovations.
If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing to your own article!
Read more from DataDecisionMakers