SAN FRANCISCO – OpenAI, one of the world’s most ambitious artificial intelligence laboratories, researchers are building technology that allows you to create digital images by describing what you want to see.
They call it DALL-E for both the 2008 animated movie “WALL-E” about the autonomous robot and the surrealist illustrator Salvador Dali.
OpenAI, backed by a ડો 1 billion funding from Microsoft, is not yet sharing the technology with the general public. But last afternoon, Alex Nicole, one of the researchers behind the system, showed how it works.
When he asked for a “tea kettle in the shape of an avocado”, typing those words into a large blank computer screen, the system created 10 separate images of dark green avocado tea, some with holes and some without. “DALL-E is good in avocados,” Mr. Said Nicole.
When he typed “playing cats chess”, he placed two fluffy kittens on either side of the checkered game board, with 32 pieces of chess in a line between them. When he called “a teddy bear playing an underwater trumpet,” an image showed small bubbles of air bouncing from the edge of the bear’s trumpet toward the surface of the water.
DALL-E can also edit photos. When Mr. Nicole erased the teddy bear’s trumpet and asked for a guitar instead, a guitar appeared between the furry hands.
A team of seven researchers spent two years developing the technology, which OpenAI ultimately plans to offer as a tool for people like graphic artists, providing new shortcuts and new ideas for creating and editing digital images. Computer programmers already use Copilot, which is based on the same technology as OpenAI, to generate snippets of software code.
But for many experts, DALL-E is worrisome. As this type of technology improves, he says, it could help spread misinformation on the Internet, which could help feed the kind of online campaign that may have influenced the 2016 presidential election.
Subbarao Kambhampati, a professor of computer science in the state of Arizona, said, “You can use it for good, but you can definitely use it for all other crazy, annoying applications, and that includes deep fakes.” Misleading photos and videos. University.
A decade and a half ago, the world’s leading AI labs created systems that could identify objects in digital images and even generate their own images, including flowers, dogs, cars and faces. Over the years, they have developed systems that can do much more than write language, summarize articles, answer questions, generate tweets, and even write a blog post.
Now, researchers are combining those techniques to create new forms of AI DALL-E, a significant step forward as it captures both language and images and, in some cases, the relationship between the two.
“We can now use multiple, intersecting streams of information to create better and better technology,” said Oren Atzioni, chief executive of the Allen Institute for Artificial Intelligence, Seattle’s Artificial Intelligence Lab.
Technology is not perfect. When Mr. Nicole told DALL-E to “put the Eiffel Tower on the moon”, but he did not understand the idea. He puts the moon in the sky above the tower. When he asked for “a living room filled with sand”, he created a scene that looked more like a construction site than a living room.
But when Mr. Nicole tweaked her requests a bit, adding or subtracting a few words here or there, providing what she wanted. When he asked for a “piano in the sand-filled living room”, the image looked like a beach in the living room.
DALL-E is what Artificial Intelligence researchers call the neural network, a mathematical system that is loosely modeled on a network of neurons in the brain. It is the same technology that recognizes the commands spoken in the smartphone and recognizes the presence of pedestrians while the self-driving car is navigating the city streets.
Neural networks learn skills by analyzing large amounts of data. By pointing to patterns in thousands of avocado photos, for example, he can learn to recognize avocados. DALL-E detects patterns as it analyzes millions of digital images as well as text captions that describe what each image represents. In this way, he learns to recognize the links between images and words.
When a person describes an image for DALL-E, it generates a set of key attributes that may be included in the image. One feature may be the line on the edge of the trumpet. Another may be a curve at the top of a teddy bear’s ear.
Then, another neural network, called the diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.
Although DALL-E often fails to understand what someone has described and sometimes breaks the image it created, OpenAI continues to improve technology. Researchers can often improve the efficiency of a neural network by providing large amounts of data.
They can also create more powerful systems by applying similar concepts to new types of data. The Allen Institute has recently developed a system that can analyze audio as well as image and text. After analyzing millions of YouTube videos, including audio tracks and captions, he learned to recognize specific moments, such as barking dogs or closing doors, in TV shows or movies.
Experts believe that researchers will continue to be such systems. Ultimately, those systems can help companies improve search engines, digital assistants, and other common technologies, as well as automate new tasks for graphic artists, programmers, and other professionals.
But those are warnings for potential. AI systems can be biased against women and people of color, as they learn their skills from a vast pool of online text, images and other data that show bias. It can be used to create pornography, hate speech and other offensive content. And while many experts believe that technology will eventually make disinformation so easy, people will be skeptical of almost everything they see online.
“We can forge text. We can put text in someone’s voice. And we can create images and videos, “said Dr. Etzioni.” There is already inaccurate information online, but the concern is that this disinformation is going to take it to new levels. “
OpenAI maintains a tight belt on the DALL-E. It will not allow outsiders to use the system on their own. It places a watermark in the corner of each created image. And although the lab plans to open the system to testers this week, the group will be smaller.
The system also includes filters that prevent users from generating those inappropriate images. When asked for a “pig with a sheep’s head” he refused to make the image. According to the lab, the combination of the words “pig” and “head” is mostly tripped by OpenAI’s anti-bullying filters.
“This is not a product,” said Mira Murthy, head of research at OpenAI. “The idea is to understand the capabilities and limitations and give us the opportunity to reduce them.”
OpenAI can control the behavior of the system in some ways. But others around the world could soon create a similar technology that puts the same powers in the hands of almost anyone. Working from a research paper describing an early version of DALL-E, Boris Dyma, an independent researcher from Houston, has already created and released a simplified version of the technology.
“People need to know that the images they see may not be real,” he said.