McKinsey donates machine learning pipeline tool Kedro to the Linux Foundation

Did you miss a session from the Future of Work Summit? Visit our Future of Work Summit on-demand library to stream.

Let OSS Enterprise Newsletter Guide your Open Source Tour! Sign up,

The Linux Foundation, a non-profit association that provides vendor-neutral hubs for open source projects. Announced today that McKinsey’s QuantumBlack will donate Cadro, a machine learning pipeline tool, to the open source community. The Linux Foundation is an umbrella organization founded in 2018. The Linux Foundation will maintain Cadro under AI & Data (LF AI & Data) to promote innovation in AI by supporting technical projects, developer communities and companies.

“We are excited to welcome the Cadro project into LF AI and data. It addresses many of the challenges that exist today in creating machine learning products and is a wonderful complement to our portfolio of hosted technical projects, ”said Ibrahim Haddad, Executive Director of LF AI and Data. “We look forward to working with the community to enhance the project’s footprint and create new opportunities for collaboration with our members, hosted projects and the larger open-source community.”

Importance of pipelines

A machine learning pipeline is a structure that regulates the flow of data in and out of a machine learning model. Pipelines include raw data, data processing, predictions and variables that fine-tune the model’s behavior with the goal of codifying the workflow so that it can be shared across the organization.

Many machine learning pipeline building tools exist, but Cadro is relatively new to the scene. Launched in 2019 by McKinsey, it is a framework written in Python that borrows concepts from software engineering and brings them into the world of data science, which serves as the foundation for moving a project from an idea to a finished product.

According to Yetunde Dada, the production lead at Cedro was developed to address the major drawbacks of one-off scripts and “glue-codes” by focusing on creating Cadro retainable, efficient data science codes. By building modularity, one objective was to inspire the creation of reusable analytics code and increase team collaboration.

With Kedro available on GitHub in two and a half years, the community and user base has grown to over 200,000 monthly downloads and over 100 contributors. Indonesia’s largest wireless network provider, Telecomcell, uses Cadro as a standard in its entire data science organization.

“It simply came to our notice then [Kedro] At this point, development can happen – if it is improved by the best people in the world, “Dada said in a statement. Significant sign of recognition as a tool, which joins a collection of other cutting-edge open-source projects such as Google-donated Kubernets, GraphQL through Facebook, or Delta Lake through MLFlow and Databricks.

Future use

Open source software has become ubiquitous in the enterprise, where it is now also used in mission-critical settings. While software integrity is in question – especially in light of recent developments – 79 percent of companies expect their use of open source software for emerging technologies to grow over the next two years, according to the 2021 Red Hat Survey.

According to Schwarzmann, once it becomes open-source, Cadro will continue to be the foundation of analytics projects within McKinsey. “The ideas and guards that exist in Cadro are a reflection of that experience and are designed to help developers avoid common pitfalls and adhere to best practices,” product manager Joel Schwarzmann said in a blog post.

A spokeswoman added via email: “Cadro will focus on formal integration with a stable API, or version 1.0, developer tools and cloud platforms, and continued work on our experiment tracking functionality. We also want our users to be sure that upgrading to Cadro versions is easy and take advantage of new features. At the moment, Cadro supports primary integration with various cloud providers, and we want to work with cloud providers to create seamless integration. Experiment tracking is a way for data scientists to keep track of data science experiments, paving the way for users to find and promote product models. We will expand this functionality with many more features according to user problems. “

Cadro joins another open source pipeline tool released by Microsoft in November: SynapseML. With SynapseML, like Kedro, developers can build systems to address challenges across entire domains, including text analytics, translation, and speech processing.


VentureBeat’s mission is to become a digital town square for technical decision makers to gain knowledge about transformative technology and practices. Our site delivers essential information on data technologies and strategies so you can lead your organizations. We invite you to access, to become a member of our community:

  • Up-to-date information on topics of interest to you
  • Our newsletters
  • Gated idea-leader content and discounted access to our precious events, such as Transform 2021: Learn more
  • Networking features and more

Become a member

Similar Posts

Leave a Reply

Your email address will not be published.