Did you miss the session at the Data Summit? See on-demand here.
This week, Nvidia announced AI-focused hardware and software innovations during its March GTC 2022 conference. The company unveiled the Grace CPU Superchip, a data center processor designed to serve high-performance computing and AI applications. And it details the H100, the first in a new line of GPU hardware aimed at accelerating AI workloads, including training large natural language models.
But one ad that slipped under the radar was the general availability of Nvidia’s Reva 2.0 SDK, as well as the company’s Reva Enterprise-powered offer. Both can be deployed to create speech AI applications and specifically point to the growing market for speech recognition. The Speech and Voice Recognition market is expected to grow from $ 8.3 billion in 2021 to $ 22.0 billion by 2026, according to Markets & Markets, operated by Enterprise Applications.
In 2018, a Pindrop survey of 500 IT and business decision makers found that 28% of consumers were using voice technology. Gartner, meanwhile, predicted in 2019 that 25% of digital workers would use virtual employee assistants every day by 2021. And a recent Opus survey found that 73% of executives see value in AI voice technology for “operational efficiency.”
“As Speech AI expands into new applications, enterprise data scientists are looking to develop, customize and deploy speech applications,” an Nvidia spokesperson told VentureBeat via email. “Riva 2.0 includes robust integration with TAO, a low code solution for data scientists to customize and deploy speech applications. This is an active area of focus and we plan to make the workflow more accessible to customers in the future. We’ve also introduced Rewa on embedded platforms for early access, and there will be more to share at a later date. “
Nvidia says Snap, the company behind Snapchat, has integrated Riva’s automated voice recognition and text-to-speech technology into their developer platform. RingCentral, another customer, is taking advantage of Reva’s automated speech recognition for video conferencing live-captioning.
Speech technology also includes voice generation tools, including “voice cloning” tools that use AI to mimic the pitch and prosody of a person’s speech. Last fall, Nvidia unveiled the Reva Custom Voice, a new toolkit that the company claims can enable customers to create custom, “human-like” sounds with just 30 minutes of speech recording data.
Brand voices such as Progressive Flow are frequently assigned the task of recording phone trees and learning scripts in corporate training video series. For companies, the cost could increase – an average rate of .6 39.63 per hour for voice resource actors, plus an additional fee for the Interactive Voice Response (IVR) prompt. Synthesis can accelerate artist productivity by reducing the need for additional recordings, potentially freeing artists to pursue more creative work – and saving businesses money in the process.
According to Markets and Markets, the global voice cloning market could grow in value from $ 456 million in 2018 to $ 1.739 billion by 2023.
As far as what’s on the horizon, Nvidia sees the emergence of new voice applications in augmented reality, video conferencing and communication AI. The company says customer expectations and focus are on ways to customize high-accuracy as well as voice experiences.
“Low-Code Solutions for Speech AI [will continue to grow] Because non-software developers are looking to create, fine-tune and deploy speech solutions, “the spokesperson continued, referring to low-code development platforms, saying that coding is not required to create voice apps. “New research is bringing emotional text-to-speech, which changes how humans interact with machines.”
These technologies are exciting, they will present new ethical challenges – and already are. For example, fraudsters have used cloning to mimic the CEO’s voice to initiate wire transfers. And some speech recognition and text-to-speech algorithms have been shown to recognize the voices of minority users less accurately than those with more common deviations.
It is imperative for companies like Nvidia to make an effort to meet these challenges before using their technology in production. To its credit, the company has taken steps in the right direction, for example banning the use of Reva for creating “fraudulent, false, misleading or misleading” content as well as “promotional material”.[s] Discrimination, bigotry, racism, hatred, harassment or harm against any person or group. “Hopefully, more is coming into this vein.
As an appendix to this week’s newsletter, it is with sadness that I announce that I am leaving Venturebeat to pursue business opportunities elsewhere. This edition of AI Weekly will be my last – a bittersweet experience, really, when I’m trying to find the words to put on paper.
When I joined VentureBeat four years ago as an AI staff writer, I had only a vague idea of the difficult journey ahead. I was not exceptionally well versed in AI – my background was in consumer technology – and the word industry was overwhelming for me, not to mention paradoxical. But as I have come to learn especially from those on the academic side of data science, an open mind – and frankly, a willingness to accept ignorance – is probably the most important component in understanding AI.
I have not always been successful in this. But as a reporter, I have tried not to ignore the fact that my domain knowledge is pale compared to the Titans of industry and education. Stories about biases in computer vision models or facing the environmental impact of training language systems, it is my policy to incline others to their expert perspectives and present these perspectives, lightly edited, to readers. As I see it, my job is to reference and rely on, not to point out. There is a place for pontification, but it is on the opinion pages – not on the news articles.
I have learned that a healthy amount of skepticism goes a long way even in reporting on AI. Not only are snake oil salesmen to be wary of, but well-oiled PR corporations, lobbyists and paid consultants claim to prevent losses but in fact do the opposite. I have lost track of the number of ethics boards that have melted or proved to be toothless; A number of damaging algorithms have been sold to customers; And a number of companies have tried to keep quiet or push back against whistleblowers.
Silver lining is a growing perception of industry fraud to regulators. But, as elsewhere in Silicon Valley, techno-optimism has declared itself to be little more than a propaganda tool.
It’s easy to get caught up in new technology innovations. I did it once – and I still do. The challenge is. The risk in this novelty reminded me of the novel When we stop understanding the world By the Chilean author Benjamin Labatt, who examines the great scientific discoveries that lead to prosperity and innumerable misery in equal parts. For example, the German chemist Fritz Heber developed the Haber-Bosch process, which synthesizes ammonia from nitrogen and hydrogen gases and almost certainly prevents drought by enabling mass production of fertilizers. At the same time, the Haber-Bosch process made the production of explosives easier and cheaper, contributing to the millions of deaths suffered by soldiers during World War I.
AI, like the Haber-Bosch process, has tremendous potential for good – and good artists are working hard to make it happen. But any technology can be abused, and it is the job of journalists to expose and spotlight those abuses – ideally to influence change. It is my hope that I, along with my esteemed colleagues at VentureBeat, have accomplished this in some small part. Here is about the future of strong AI reporting.
For AI coverage, be sure to subscribe to the AI Weekly Newsletter and bookmark our AI channel, The Machine.
Thanks for reading,
Senior AI Staff Writer
Venturebeat’s mission Transformative Enterprise is about to become a digital town square for technology decision makers to gain knowledge about technology and transactions. Learn more