It’s no secret that larger models, such as the DALL-E 2 and Imagen, are trained on a large number of documents and images taken from the web, absorbing the worst aspects of the data as well as the best. OpenAI and Google explicitly acknowledge this.
Scroll down to the Imagine website – in the back-social effects section of a small cactus wearing a karate belt wearing a dragon fruit and a hat and sunglasses and you’ll find this: Content, such as pornography and toxic language, we have also used [the] The LAION-400M dataset contains a wide range of inappropriate content, including pornographic images, racist slurs, and harmful social stereotypes. Imagine relies on a trained text encoder on composite web-scale data, and thus inherits the social biases and limitations of the larger language model. Therefore, there is a risk that Imagen encodes harmful stereotypes and representations that guide our decision not to release Imagen for public use without further safety measures. “
That’s the kind of acknowledgment that OpenAI made when announcing GPT-3 in 2019: “Internet-trained models have Internet-scale biases.” And Mike Cook, a researcher on AI creativity at Queen Mary University in London, points out that it is in the statement of ethics that accompanies Google’s big language model PaLM and OpenAI’s DALL-E 2. In short, these companies know that their model is capable. They have no idea how to create awesome stuff, and how to fix it.