We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for discerning conversations and exciting networking opportunities. Register today!
The path to “generalized intelligence” – that is, what many consider to be sci-fi content – begins with ambient intelligence. And that future is now emerging.
“We live in the golden realm of AI, where dreams and science fiction are becoming a reality,” said Rohit Prasad, Alexa’s senior vice president and head scientist at Amazon.
Prasad today spoke at a conference on the evolution from Ambient Intelligence to Generalized Intelligence (GI) at Re: MARS, Amazon’s Machine Learning, Automation, Robotics and Space.
“Ambient intelligence is when the underlying AI is available everywhere, helping people when they need it – and also learning to anticipate the needs – then fading into the background when they don’t need it,” Prasad said.
Prasad said a key example and significant step toward GI is Amazon’s Alexa, which he described as a “personal assistant, mentor, partner”.
The Virtual Assistant is equipped with a 30 ML system that processes a variety of sensory signals, he explained. It receives more than 1 billion requests a week in 17 languages in dozens of countries. It will fly to the moon as part of the Artemis 1 mission, scheduled to begin in August, he said.
The future Alexa feature will be able to synthesize short audio clips into long speeches. Prasad gave the example of a dead grandmother who was reading a story to her grandson at bedtime.
“This required inventions where we had to learn to produce high-quality sound with less than a minute of recording versus hours of recording,” he said. He added that the problem involves being framed as a voice conversion function and not as a speech generation path, he said.
The surrounding intelligence Reactive, active, predictable
As Prasad explained, Ambient Intelligence is both responsive (responding to direct requests) as well as active (expected needs). This is accomplished through the use of numerous sensing techniques: vision, sound, ultrasound, depth, mechanical and atmospheric sensors. This is then dealt with.
That said, this ability requires deep learning as well as natural language processing (NLP). Ambient intelligence “agents” also have self-observation and self-learning, which allows them to generalize what they learn and apply it to new contexts.
Alexa’s self-learning method, for example, automatically corrects millions of flaws a week, he said – these are both customer errors as well as errors in his own natural language comprehension (NLU) model.
He described this as the “most practical” way for GI, or the ability to understand and learn any intellectual function that humans can do for an AI entity.
Ultimately, “that’s why the Ambient-Intelligence Path leads to generalized intelligence,” Prasad said.
What do GI agents really do?
There are three characteristics of generalized intelligence. GI “agents” can complete multiple tasks, evolve in a changing environment and learn new concepts and actions with minimal external human input.
GI also requires a significant dose of general knowledge. Alexa already demonstrates this, he said: if a user asks to set a reminder for the Super Bowl, for example, he will recognize the date of the big game when converting it into their time zone, then remind him before it starts. It also suggests routines and detects discrepancies through its “hunch” feature.
However, he pointed out, GI is not an “omniscient, omnipotent” technology that can accomplish any task.
“We humans are still the best example of generalization,” he said, “and the standard for AI that wants.”
GI is already realizing, he pointed out: trained foundation transformer-based large language models with self-inspection power many tasks with much less manually labeled data than before. An example of this is Amazon’s Alexa Teacher model, which draws on knowledge from NLU, speech recognition, dialogue prediction, and visual comprehension.
The goal is to take automated reasoning to new heights, the first goal being the “widespread use” of commonsense knowledge in communicative AI, he said.
To work towards this, Amazon has released a dataset for commonsense knowledge with over 11,000 new collected dialogs to help research open-domain communication.
The company has also explored a generative approach that it considers “think before you speak”. In this the AI agent learns to externalize implicit commonsense knowledge (“ideas”) and uses a broad language model (such as the freely available semantic network conceptnet) in conjunction with the commonsense knowledge graph. It then uses that knowledge to generate responses (“speak”).
Amazon is also training Alexa to answer complex questions that require multiple guessing measures, and is also enabling “conversation research” on ambient devices so users don’t have to pull their phones or laptops to explore the web.
Prasad said this ability requires predicting the flow of dialogue through deep learning; Web-scale neural data retrieval; And automated summaries that can retrieve information from multiple sources.
Alexa Conversation Dialog Manager helps Alexa decide what to do based on interactions, dialogue history, current inputs and questions, query-guided and self-meditation methods. Neural data retrieval pulls information from a variety of methods and languages based on billions of data points. Transformer-based models – trained using a multistage paradigm optimized for different data sources – help match the query meaningfully with relevant information. The Deep Learning model distorts information for users by capturing critical information.
Prasad described the technology as multitasking, multilingual, and multimodal, allowing for “more natural, human-like communication.”
The ultimate goal is to make AI not only useful, but also convenient for consumers in their daily lives. It’s intuitive, they want to use it, and come to rely on it. It is the AI that thinks before it speaks, is equipped with common sense graphs and can generate responses through the ability to explain – in other words, the ability to process questions and answers that are not always straightforward.
Ultimately, GI is becoming more and more realistic day by day, as “AI can generalize better than ever before,” Prasad said.
For retail, AI just learns to move away
Amazon is also using ML and AI to “rediscover” physical retail through capabilities such as futuristic palm scanning and smart carts in its Amazon Go stores. This enables the ability to “just walk out,” explains Dilip Kumar, vice president of physical retail and technology.
The company opened its first physical stores in January 2018. These have evolved from a 1,800-square-foot facility style to a 40,000-square-foot grocery style, Kumar said. The company took this further with its Dash Cart in the summer of 2020 and the Amazon One in the fall of 2020.
Advanced computer vision capabilities and ML algorithms allow people to scan their palms as they enter a store, pick up items, add them to their carts, and then exit.
Palm scanning was chosen because gestures should be intentional and intuitive, Kumar explained. The palms are associated with the customer’s credit or debit card information and some degree of accuracy is achieved through the subsurface images of the vein information.
Kumar said this “allows for accuracy in order of intensity more than what facial recognition can do.”
Meanwhile, the carts are equipped with weight sensors that identify specific items and the number of items. Advanced algorithms can also handle the increased complexity of “picks and returns” – or when a customer changes their mind about something – and eliminate the noise around them.
These algorithms are run locally in the store, in the cloud and on the edge, Kumar explained. “We can mix and match depending on the environment,” he said.
“The goal is to take this technology completely into the background, so that consumers can focus on shopping,” Kumar said. “We hid all this complexity from customers,” he said, so they could “immerse themselves in their shopping experience, their mission.”
Similarly, the company opened its first Amazon style store in May 2022. Upon entering the store, customers can scan items on the shop floor that are automatically sent to the fitting room or pick-up desk. They are also given instructions on additional purchases.
Finally, Kumar said, “We are very early in our search, we are pushing the boundaries of ML. We have a lot of innovations. “
Venturebeat’s mission Transformative Enterprise is about to become a digital town square for technology decision makers to gain knowledge about technology and transactions. Learn more about membership.