Did you miss a session from the Future of Work Summit? Visit our Future of Work Summit on-demand library to stream.
This article was contributed by Neesapians CEO Taesu Kim,
The AI revolution is taking place in the field of content creation. Voice technology in particular has made great strides in the last few years. While this could lead to numerous new content experiences not to mention the dramatically reduced costs associated with content development and localization, there are many concerns about what will happen in the future.
Imagine that you are known for your distinctive voice and depend on it for your livelihood – James Earl Jones, Christopher Walken, Samuel L. Artists like Jackson, Fran Dresser and Kathleen Turner or musicians like Adele, Billy Elish, Snoop Dogg, or Harry Style. If a machine were trained to imitate them, would they lose all artistic control? Will they suddenly provide voice-over for a YouTube channel in Russia? And, in practice, will they miss out on potential royalties? What about someone who is looking for a break, or maybe a way to earn some extra cash by digitally licensing their voice or similarities?
Sound is more than a combination of sounds
There is something terribly thrilling that happens when you type a series of words, click a button and listen to your favorite superstar read them back, with a natural rise and fall in their speech, a real human-like sound, a change in pitch and Swarup. This is not something robotic, as we are used to the characters created from AI. Instead, the character you create comes alive with all its level dimensions.
This depth is what previously lacked virtual actors and virtual identities; The experience was, quite frankly, unexpected. But modern AI-based voice technology can create identities whose complex characteristics come out through the soundtrack. The same may be true for AI-based video actors who move, gesture, and use facial expressions just like humans, without which the characters provide an existing noise that can be flattened out.
As technology improves to the point that it can gain a true knowledge of every characteristic of a person’s surface identity – such as their appearance, sound, manners, ticks and anything else that makes you see and hear, except them. Thoughts and feelings – that identity becomes an actor that can only be deployed by big studios in big budget movies or album releases. Anyone can select and hire a virtual actor using a service like Typecast. The key here is that he is an actor, and even novice actors get paid.
Understandably, there is little apprehension about how such similarities can be co-opted and used without a license, consent or payment. I will compare this to the issues we have seen as any new medium has come on the scene. For example, digital music and video content that was once thought to snatch away the income of artists and studios has become a thriving business and new money makers are indispensable to today’s bottom line. Solutions were developed that led to the advancement of technology, and the same is true again.
Maintaining your digital and virtual identity
Each human voice – as well as a face – has its own unique footprint, consisting of thousands of characteristics. This makes it very difficult to copy. In a world of deep forgery, misrepresentation and identity theft, many technologies can work to prevent the misuse of AI speech synthesis or video synthesis.
Voice recognition or speaker detection is one example. Researchers and data scientists can identify and break down the sound characteristics of a particular speaker. In doing so, they can determine which unique sound was used in the video or audio snippet, or whether it was a combination of multiple sounds blended together and converted by text-to-speech technology. Finally, such identification capabilities can be applied in an application like Shazam. With this technology, AI-powered voice and video companies can detect whether their text-to-speech technology has been misused. The material can then be flagged and removed. Think of it as a new type of copyright monitoring system. Companies, including YouTube and Facebook, are already developing such technologies for music and video clips, and it won’t be long before that becomes the norm.
Deep fake detection is another area where significant research is being conducted. Technology is being developed to detect whether the face in the video is a real human or has been digitally manipulated. For example, a research team has developed a system based on the Convoluted Neural Network (CNN) to pull features at the frame-by-frame level. It can then compare them and train the recurrent neural network (RNN) to classify digitally manipulated video – and it can do this quickly and on a scale.
These solutions may make some people uncomfortable, as many are still at work, but let’s allay these fears. Investigative techniques are being actively developed with future needs in mind. In the meantime, we have to consider where we are now and be very sophisticated to clone and deceive the synthesized audio and video.
An AI system designed to create voice and / or video can only learn from a clean dataset. Today, this means that it can only come from filming or recording done in the studio. It is remarkably difficult to record data in a professional studio without the consent of the data subject; The studio is unwilling to risk a lawsuit. In contrast, data crawled on YouTube or other sites provides such a noisy dataset that it is only able to create low-quality audio or video, making it easier to find and remove illegal content. This automatically excludes suspects who have the potential to misuse and manipulate digital and virtual identities. When it is finally possible to create high-quality audio and video with noisy datasets, detection technology will be well-prepared in advance, providing adequate protection.
Virtual AI actors are still part of the new space, but one that is rapidly gaining momentum. New revenue streams and content development possibilities will continue to push virtual characters forward. This, in turn, will provide sufficient impetus to implement a new generation of sophisticated inventions and digital rights management tools to manage the use of AI-powered virtual identity.
Taesu Kim is the CEO of Neosapience,
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including tech people working on data, can share data-related insights and innovations.
If you would like to read about the latest ideas and latest information, best practices and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing to your own article!
Read more from DataDecisionMakers