Did you miss the session at the Data Summit? See on-demand here.
If data is the new oil of the digital economy, then Artificial Intelligence (AI) is the steam engine. Companies that leverage the power of data and AI hold the key to innovation – such as transporting oil and steam engines and, ultimately, fueling the industrial revolution.
In 2022, data and AI set the stage for the next chapter of the digital revolution, empowering companies around the world. How can companies ensure that accountability and ethics are at the root of these radical technologies?
Define responsibility in data and AI
Arguably, one of the biggest contributing factors to bias in AI is the lack of diversity among annotators and data labels, who train the model that AI eventually learns.
Saif Savage, a panelist at VentureBeat’s Data Summit and assistant professor and director of the Civic AI Lab at Northeastern University’s Khurri College of Computer Science, says responsible AI starts with groundwork that includes startups.
“On the one hand, being able to hire a wide variety of employees to do data labeling for your company is something to think about,” Savage said during VentureBit’s Data Summit conference. “Why? Let’s just say you hire workers from New York. Based on prejudices. “
Industry experts understand that most of the AI models in production today require annotated, labeled data to promote AI intelligence and ultimately the overall capabilities of the machine.
Technologies supporting this are also complex, such as Natural Language Processing (NLP), Computer Vision and Sentiment Analysis. With these complications, the margin for error regarding how AI is trained can unfortunately be very large.
Research has shown that even well-known NLP language models have racial, religious, gender, and occupational biases. Similarly, researchers have documented evidence of biases transmitted in computer vision algorithms that have shown that these models automatically learn bias in the way that groups of people (by ethnicity, gender, weight, etc.) are portrayed online stereotypically. The sentiment analysis model has similar challenges.
“Responsible AI is a very important topic, but it is as good as it is efficient,” said Olga Magorskaya, data summit panelist and CEO of global data labeling platform Toloka AI. “If you are a business, constantly implementing AI responsibly means that you monitor the quality of the models deployed in the product at all times and understand where the decisions made by AI come from. [You must] Understand the data on which these models were trained and constantly update the training models in the current context in which the model operates. Second, responsible AI means responsible treatment with those who are actually working behind the scenes training AI models. And this is where we work closely with many researchers and universities. ”
Clarity and transparency
If the responsible AI is as good as it is efficient, then the clarity and transparency behind the AI is as good as the annotators and labels working with the data as well as the detailed information and information sense for the customers of the companies. Using services like Toloka.
In particular, Toloka, which launches in 2014, serves as a crowdsourcing platform and microtasking project to quickly markup various individuals around the world, ultimately used to improve machine learning and search algorithms.
In the last eight years, Toloka has expanded; Today, the project has more than 200,000 users contributing to data criticism and labeling from more than 100 countries around the world. The company also develops tools to help detect biases in datasets and tools that provide quick feedback on issues related to labeling projects that may affect the requesting company’s interface, project or tools. Toloka works closely with researchers in laboratories such as the Civic AI Lab at Khurri College of Computer Science at Northeastern University, where Savage works.
According to Magorskaya, companies in the AI and data labeling market should work towards transparency and persuasiveness in a way that “… matches[es] To create a win-win situation for the interests of both workers and businesses where everyone benefits from normal development.
Magorskaya recommends that in order to ensure transparency and persuasiveness on the internal and external fronts, the enterprise should adhere to the following:
- Constantly organize the data that AI is trained to reflect current real life situations and data.
- Measure the quality of the models and use that information to track its refinement and performance overtime to build a metric on the quality of your models.
- Be nimble and agile. Think of transparency as visibility in the guidelines that data labels should follow when annotating.
- Make feedback accessible and prioritize addressing it.
For example, Toloka’s platform provides visibility of available functions as well as guides for working labels. This way, there is a direct, quick response from the labeling workers and the companies requesting that work. If a labeling rule or guideline needs to be adjusted, that change can be made in a moment’s notice. This process makes room for teams of labels to approach the rest of the data labeling process in a more integrated, accurate and updated way – allowing for a human-centered approach to addressing biases in a way that might arise.
Bringing ‘humanity’ to the forefront of innovation
Both Magorskaya and Savage agree that if a company abandons its data labeling and annotations to third-parties or outsourcing, the decision itself creates a rift in the responsible development of AI and will eventually lead to training. Often, companies that label and train their AI models do not have the option of having direct contact with individuals who are actually labeling the data.
Focusing on removing bias from the AI manufacturing sector and breaking the cycle of disconnected systems, Toloka says AI and machine learning will become more inclusive and representative of society.
Toloka hopes to pave the way for this change and intends to hire development engineers to request companies to meet face to face with data labels. In doing so, they can see the diversity in end-users that will ultimately affect its data and AI. Engineering without visibility in real people, places and communities will eventually affect the company’s technology, creating a gap, and thus bridging that gap creates a new level of responsible development for teams.
“In the modern world, no effective AI model can be trained on some of the data collected by a narrow group of pre-selected people who spend their lives just criticizing this,” said Magorskaya.
Toloka is creating data sheets to show workers the biases they may have. “When you’re labeling data, these sheets show information such as what kind of workers’ backgrounds are, what’s missing,” Savage said. “This is especially helpful for developers and researchers to look at so they can make a decision to get backgrounds and perspectives that may be missing in the next run to make the model more inclusive.”
While it may seem like a daunting endeavor to incorporate numerous ethnicities, backgrounds, experiences, and upbringing globes into each dataset and model, Savage and Magorskaya emphasize that the most important way for entrepreneurs, researchers, and developers to move towards equality. And responsible AI is about involving many big stakeholders that will affect your technology from the beginning, as it later becomes more difficult to correct the biases on the road.
“It can be difficult to say that AI can be fully responsible and ethical, but it is important that you approach this objective as closely as possible,” said Magorskaya. “It’s important to have as comprehensive and inclusive representation as possible to give engineers the best tools to make AI as efficiently and responsibly as possible.”
Venturebeat’s mission Transformative Enterprise is about to become a digital town square for technology decision makers to gain knowledge about technology and transactions. Learn more