This horse-riding astronaut is a milestone in AI’s ability to make sense of the world

To support MIT Technology Review journalism, please consider becoming a subscriber.,

Dispersion models are trained on images that are completely distorted with random pixels. They learn to turn these images into their original form. In DALL-E 2, there are no existing images. So the diffusion model takes random pixels and, guided by CLIP, converts it into a brand new image created from scratch, matching the text prompt.

The diffusion model allows DALL-E 2 to create high-resolution images faster than DALL-E. Aditya Ramesh at OpenAI says, “It makes it very practical and enjoyable to use.

In the demo, Ramesh and his teammates used a calculator to show me pictures of a hedgehog, a panda playing corgi and chess, and a piece of cat cheese dressed as Napoleon. I comment on the fantastic cast of subjects. “It’s easy to burn through the whole working day thinking about prompts,” he says.

“Sea otter in the style of Girl with a Pearl Earring by Johannes Vermeer” / “An Ibis in the Forest Painted in the Style of John Audubon”

DALL-E 2 still slides up. For example, it may conflict with a prompt that tells it to connect two or more objects with two or more features, such as a “red cube on top of a blue cube.” OpenAI believes this is because CLIP does not always associate objects properly.

In addition to refining text prompts, DALL-E 2 can spin a variation of an existing image. Ramesh plugs into a street art photo taken outside his apartment. AI immediately launches alternate versions of the scene with various art on the wall. Each of these new images can be used to initiate their own sequence of variations. Ramesh says, “This feedback can be really useful for loop designers.

An early user, an artist named Holly Herndon, says she is using DALL-E 2 to create wall-sized creations. He says, “I can piece together huge artwork like patchwork tapestry or descriptive travel.” “It’s like working in a new medium.”

User beware

The DALL-E 2 looks more like a polished product than the previous version. Ramesh says that was not the intention. But OpenAI plans to release DALL-E 2 to the public after an initial rollout for a small group of trusted users, as it did with GPT-3. (You can sign up here for access.)

GPT-3 can produce toxic text. But OpenAI says it has used feedback from GPT-3 users to train a secure version called InstructGPT. The company hopes to follow the same path with DALL-E 2, which will also be shaped by user feedback. OpenAI will encourage early users to break the AI, deceiving them into creating offensive or harmful images. As it deals with these issues, OpenAI DALL-E 2 will begin to make available to a wider group of people.

OpenAI is also releasing a user policy for DALL-E, which prohibits AI from being asked to generate offensive images કોઈ no violence or pornography અને and no political image. To prevent deep fakes, users will not be allowed to ask DALL-E to generate images of real people.

“A bowl of soup that looks like a monster, woven from wool” / “A Shibu Inu dog wearing a barrette and black turtleneck”

In addition to the user policy, OpenAI has removed certain types of images from the DALL-E 2’s training data, showing graphic violence. OpenAI also says it will pay human mediators to review every image generated on its platform.

Praful Dhariwal at OpenAI says, “Our main objective is to get a lot of feedback for the system before we start sharing it more widely. “I hope it eventually becomes available, so developers can build apps on it.”

Creative intelligence

A multiskilled AI that can see the world and work with concepts across multiple modalities – such as language and vision – is a step towards more general purpose intelligence. DALL-E 2 is one of the best examples yet.

But while Etzioni is impressed with the images created by DALL-E 2, he is wary of what this means for the overall advancement of AI. “This kind of improvement is not bringing us closer to AGI,” he says. “We already know that AI is significantly capable of solving compressed tasks using deep learning. But it is still human beings who design these works and teach them in depth the commands of the march. “

For Mark Riddle, an AI researcher at Georgia Tech in Atlanta, creativity is a great way to measure intelligence. Unlike the Turing test, which requires a machine to fool a man through conversation, Riddle’s Loveless 2.0 test determines the machine’s intelligence according to how well it responds to requests to create something, such as a “picture of a penguin in a spacesuit on Mars.” ”

Similar Posts

Leave a Reply

Your email address will not be published.