We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!
Data can be a company’s most valuable asset – it can be even more valuable than a company. But if the data is inaccurate or there are persistent delays due to delivery problems, the business cannot use it properly to make informed decisions.
Having a solid understanding of a company’s data assets is not easy. The environment is changing and becoming increasingly complex. Tracking the origin of the dataset, analyzing its dependencies, and keeping documentation up to date are all resource-intensive responsibilities.
This is where data operations (dataups) come in. Dataops – not to be confused with his cousin, devops – began as a series of best practices for data analytics. Over time, it evolved into a fully formed practice on its own. Here’s the promise: Dataops help accelerate the data lifecycle, from developing data-focused applications to delivering accurate business-critical information to end-users and customers.
Dataps came about because most companies had inefficiency in the data estate. Various IT silos did not communicate effectively (if they communicate at all). Tooling created for a team – which uses data for a specific task – often prevents a separate team from gaining visibility. Data source integration was haphazard, manual, and often problematic. Tragic outcome: The quality and value of the information delivered to end-users was lower than expected or completely inaccurate.
When Dataps offers a solution, people in the C-suite may worry that it may be more on promises and less in value. It seems like a risk to upset already functioning processes. Are the advantages greater than the inconvenience of defining, implementing, and adopting new processes? In my own institutional discussions on this subject, I frequently refer to and refer to the rule of ten. It costs ten times more to complete a task when there is a defect in the data than when the information is good. Using that argument, dataups are important and worth the effort.
You can already use Dataps, but don’t know it
In broad terms, dataps improve communication between data stakeholders. It frees companies from its growing data silos. dataops is nothing new. Many agile companies are already practicing datapos constructs, but they may not be able to use the term or be aware of it.
Dataps can be transformative, but like any great framework, success requires a few basic rules. Here are the top three real-world requirements for effective dataups.
1. Committed to observability in Dataps process
Observability is fundamental to the entire dataups process. It gives companies a bird’s-eye view on their continuous integration and continuous delivery (CI / CD) pipeline. Without observability, your company cannot securely automate or use continuous delivery.
In a skilled Devops environment, observable systems provide that holistic view – and that view should be accessible across sections and should be included in the CI / CD workflow. When you are committed to observability, you place it on the left side of your data pipeline – inspecting and tuning your communication system before data enters production. When designing your database you should start this process and observe your non-productive systems with different users of that data. By doing this, you can see how well the apps interact with your data – Before the database goes into productionOn
Monitoring tools can help you stay informed and do more diagnostics. In turn, your troubleshooting recommendations will improve and help you correct them before errors escalate. Pros refers to monitoring data. But remember to follow the “Hippocratic oath” of monitoring: first, do no harm.
If your oversight creates so much overhead that your performance decreases, you’ve crossed a line. Make sure your overhead is low, especially when adding visibility. When data monitoring is seen as the basis of observability, data professionals can ensure that operations move forward as expected.
2. Map your data estate
You should know your schema and your data. These are the basics for the Dataps process.
First, document your overall data assets to understand the changes and their impact. As database schemas change, you need to measure their effects on applications and other databases. This impact analysis is only possible if you know where your data is coming from and where it is going.
In addition to database schema and code changes, you must control compliance with the full view of data privacy and data descent. Tag location and data types, especially Personally Identifiable Information (PII) – Know where all your data resides and where it goes. Where is sensitive information stored? What other apps and reports does that data flow through? Who can access it in every system?
3. Automatic data testing
The widespread adoption of Devops has led to a general culture of unit testing for code and application. Often overlooked is the testing of data, its quality and how it works (or does not work) with code and applications. Effective data testing requires automation. It also needs constant testing with your new data. New data not tried and true, it is unstable.
To make sure you have the most stable system available, test using the most volatile data you have. Break things up early. Otherwise, you’ll run into inefficient routines and processes in production and you’ll have a bad surprise when it comes to costs.
The product you use to test that data – whether it’s a third-party or you’re writing your own scripts – needs to be solid and be part of your automated testing and build process. As the data progresses through the CI / CD pipeline, you should perform quality, access, and performance tests. In short, you want to understand what you have before you use it.
Dataps are important for becoming a data business. It is the ground floor of data transformation. These three essentials will allow you to know what you already have and what you need to get to the next level.
Douglas McDowell is the general manager of the database at SolarWinds,
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including tech people working on data, can share data-related insights and innovations.
If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing to your own article!
Read more from DataDecisionMakers