Adopting MLSecOps for secure machine learning at scale

We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!


Given the complexity, sensitivity and scale of the software stack of a particular enterprise, security has always been a central concern for most IT teams. But in addition to the well-known security challenges faced by Devops teams, organizations also need to consider a new source of security challenges: machine learning (ML).

ML adoption is skyrocketing in every field, McKinsey found that by the end of last year, 56% of businesses had adopted ML in at least one business venture. However, in the adoption race, many face different security challenges that come with ML, including the challenges of responsibly deploying and taking advantage of ML. This is especially true in more recent contexts where complex data and infrastructure are deployed on a machine learning scale for inclusive use-cases.

Security concerns for MLs become particularly pressing when the technology operates in a lively enterprise environment, given the scale of potential disruption by security breaches. At this point, MLs also need to integrate into the existing practice of IT teams and avoid becoming a source of hassle and downtime for the enterprise. Along with the principles governing the responsible use of AI, this means that teams are changing their methods to create stronger safety practices in their workload.

Rise of MLSecOps

To address these concerns, there is a drive for machine learning practitioners to adapt the practice developed for deployment and IT security to deploy on an ML scale. That’s why professionals working in the industry are building a specialty that integrates security, develops and ML-machine learning security operations, or ‘MLSecOps’ for short. As a practice, MLSecOps works to bring together automation and security policies between ML infrastructure, developer and operations teams.

But what challenges does MLSecOps really solve? And how?

The rise of MLSecOps has been driven by the growing popularity of a broad set of security challenges facing the industry. To give an understanding of the scope and nature of the problems that have arisen in response to MLSecOps, let’s cover in two detail: access to model endpoints and supply chain vulnerabilities.

Model access

Major security risks are posed by different levels of unrestricted access to machine learning models. The first and more intuitive level of access to a model can be defined as “black-box” access, meaning that ML is able to make predictions on models. While this is key to ensuring that the model is used by a variety of applications and use-cases to generate commercial value, unrestricted access to the use of model predictions can present a variety of security risks.

An open model may be subject to “anti” attack. Such an attack involves a reverse-engineered model to generate “counter-examples”, which is the input of the model with additional numerical noise. This statistical noise model works to misinterpret input and predict a class different from the class that is intuitively expected.

A textbook example of an adverse attack includes a picture of a stop sign. When unfavorable noise is added to the picture, the AI-powered self-driving car may assume that it is a completely different signal – such as a “yield” sign – while still looking like a stop sign to a human.

Example A general hostile attack on the image classifier. Image by ERCIM by Fabio Carrara, Fabrizio Falchi, Giuseppe Amato (ISTI-CNR), Rudy Becarelli and Roberto Caldeli (CNIT research unit at MICC University of Florence).

Then there is the “white-box” model access, which includes the model’s internal access at various stages of machine learning model development. In a recent software development conference, we demonstrated how it is possible to inject malware into a model, which can trigger arbitrary and potentially malicious code when deployed in a product.

Other challenges may arise around data leakage. Researchers have been able to successfully reverse engineer the training data from the internal learned weight of the model, which could result in leaking sensitive and / or personally identifiable data, potentially causing significant damage.

Supply chain vulnerabilities

Another security concern facing ML is that much of the software industry is also facing the issue of software supply chains. Ultimately, the issue comes down to the fact that the enterprise IT environment is incredibly complex and pulls many software packages to work. And often, a breach in one of these programs in an organization’s supply chain could otherwise compromise a completely secure setup.

In the non-ML context, consider the 2020 Solarwinds breach in which the US federal government and large parts of the corporate world were breached by supply chain weakness. This has led to an urgent need to strengthen the software supply chain in every sector, especially given the role of open source software in the modern world. In addition, the White House is now hosting a high-level summit on concerns.

Just as supply chain vulnerabilities can cause disruptions in any software environment, they can also attack the ecosystem surrounding the ML model. In this scenario, the effects could be even worse, especially given how much ML relies on open-source advances and how complex models can be, including the downstream supply chain of libraries that are required to operate them effectively.

For example, this month it was discovered that the long-established Ctx Python package at the PyPI open-source repository had been tampered with data-stealing code, with more than 27,000 copies of the compromised packages downloaded.

With Python being one of the most popular languages ​​for ML, supply chain compromises such as Ctx breach are pressing especially for ML models and their users. Any maintainers, contributors, or users of software libraries may have at some point experienced the challenges posed by the second, third, or fourth or higher level of dependence that libraries bring to the table – for ML, these challenges can be significantly more complex.

Where do MLSecOps come from?

Something shared by both of the above examples is that, while these are technical problems, they do not require new technology to address them. Instead, these risks can be reduced by placing higher standards on both existing processes and employees. I consider this to be the driving force behind MLSecOps – the centrality of robust processes for hardening ML for the production environment.

For example, while we have covered two high-level areas specific to the ML model and code, there is also a wide range of challenges surrounding the ML system infrastructure. Best practices in authentication and authentication can be used to secure model access and endpoints and to ensure that they are functional only based on usage requirements. For example, access to models can take advantage of multi-level permission systems, which can reduce the risk of malicious parties having access to both black-box and white-box. The role of MLSecOps, in this case, is to develop robust practices that tighten model access while at least hindering the work of data scientists and devops teams, allowing teams to work more efficiently and effectively.

The same is true for software supply chains, which call for good MLSecOps teams to build in the process of regularly checking their dependencies, update them as appropriate, and act as quickly as possible in the event of a vulnerability. The challenge for MLSecOps is to develop these processes and integrate them into the day-to-day workflow of the rest of the IT team, with the idea of ​​automating them largely to minimize the time spent on manually reviewing the software supply chain.

There is also a wide range of challenges around the infrastructure behind ML systems. But what these examples hopefully show us is this: when an ML model and the environment associated with it cannot be hacked, most security breaches are simply due to a lack of best practice at various stages of the development life cycle.

The role of MLSecOps is to deliberately introduce security into the infrastructure that monitors the end-to-end machine learning lifecycle, including the ability to identify what those vulnerabilities are, how they can be overcome and how these solutions fit into the day. Can happen. Today’s life of team members.

MLSecOps is an emerging field in which people working in and around it continue to explore and define safety vulnerabilities and best practices at every stage of the machine learning lifecycle. If you are an ML Practitioner, this is a great time to contribute to the ongoing debate as the field of MLSecOps continues to evolve.

Alejandro Saucedo is the Engineering Director of Machine Learning at Seldon,

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is a place where experts, including tech people working on data, can share data-related insights and innovations.

If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing to your own article!

Read more from DataDecisionMakers

Similar Posts

Leave a Reply

Your email address will not be published.