Did you miss the session at the Data Summit? See on-demand here.
“The head is up. Conversations like these can be intense. Don’t forget the man behind the scenes.
Twitter’s dialogue alert is the latest in a long-running battle to help us become more civic towards each other online. Perhaps more confusing is the fact that we train large-scale AI language models often with toxic online conversation data. No wonder we reflect prejudice in machine-generated language. What if we’re creating metavers – effectively the next version of the web – that we use AI to filter out toxic dialogue for good?
Festoon for language?
Right now, researchers are doing a lot with AI language models to tune their accuracy. In multilingual translation models, for example, a man in a loop can make a big difference. Human editors can verify that cultural noise is properly reflected in translation and effectively train algorithms to avoid similar errors in the future. Think of humans as a tuneup for our AI system.
If you envision metavers as a kind of scale-up symmetry, this kind of AI translation can instantly make us all multilingual when we talk to each other. Boundless society can create a playground for people (and their incarnations) who speak less common languages and potentially promote more intercultural understanding. It could also open up new opportunities for international trade.
There are serious ethical issues with using AI as a festoon for language. Yes, we can introduce some control over the style of language, flagging cases where the model is not working as expected or even literally modifying it. But how far is it? By limiting abusive or derogatory speech and behavior, how do we continue to promote diversity of opinion?
The framework for algorithmic justification
One way to make language algorithms less biased is to use synthetic data for training in addition to using the open internet. Synthetic data can be generated on the basis of relatively small “real” datasets.
Artificial datasets can be created to reflect the real-world population (not just the ones that speak louder on the Internet). It is relatively easy to see where the statistical properties of a particular dataset are outside and therefore where synthetic data can best be deployed.
All of this begs the question: will virtual data become an important part of making the virtual world fair and just? Can our decisions in Metavers also affect how we think and communicate with each other in the real world? If the end of these technological decisions is a more civic global discourse that helps us understand each other, synthetic data could be worth its algorithmic weight in gold.
However, as exciting as it may seem to think that we can press a button and modify behavior to create a new image of the virtual world, this is not something that technologists alone will decide. It is not clear whether companies, governments or individuals will regulate the rules governing fairness and standards of conduct in Metawares. With so many conflicting interests in the mix, it would be worthwhile to hear from leading tech experts and consumer advocates on how to proceed. The blue sky is thinking that maybe there will be a union for collaboration between all the competing interests, but now we must create one to discuss fair language AI. Each year of inactivity means dozens – if not hundreds – of metavores will need to be retrofitted to meet any potential standards. These issues surrounding what it means to have a truly accessible virtual ecosystem now need to be discussed before the mass adoption of metavars, which will be here before we know it.
Vasco Pedro is the co-founder and CEO of the AI-powered language operations platform UnbelievableHe spent more than a decade in academic research focusing on language techniques and previously worked at Siemens and Google, where he helped develop techniques for data calculation and language comprehension.
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including tech people working on data, can share data-related insights and innovations.
If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing to your own article!
Read more from DataDecisionMakers