The only real limits to DALL-E Mini are the creativity of your own prompts and its uncanny brushwork. The accessible-to-all AI internet image generator can conjure up blurry, twisted, melting approximations of whatever scenario you can think up. Seinfeld nightmares? You got it. Court room sketches of animals, vehicles, and notable people in varying combinations? Easy peasy. Never before seen horror monsters from the mind of the mindless. Sure, whatever.
But give DALL-E Mini literally nothing, and it quickly reveals the limits of its own “imaginings.” Given no direction or guidance, the AI model seems to get stuck. With absolutely no prompt, the program will without a doubt give you back an image of a woman in a sari (a garment commonly worn across South Asia.)
Even the tool’s developer, Boris Dayma, doesn’t know exactly why, according to reporting from Rest of World. “It’s quite interesting and I’m not sure why it happens,” he said to Rest of World about the phenomenon.
DALL-E Mini was inspired by DALL-E 2, a powerful image generator from OpenAI. The pictures that DALL-E 2 creates are much more realistic than those that “mini” can make, but the trade-off is that it requires too much computing power to be tossed around by just any old internet user. There’s a limited capacity and a waitlist.
So Dayma, unaffiliated with OpenAI, opted to create his own, less exclusive version which launched in July 2021. In the past few weeks, it’s become wildly popular. The program has been managing about 5 million requests every day, Dayma told Rest of World. As of Monday, DALL-E Mini was renamed Craiyon and shifted to a new domain name, at the insistence of OpenAI.
Like any other artificial intelligence model, DALL-E Mini/Craiyon creates outputs based on training inputs. In the case of Mini, the program was trained on a diet of 15 million image and caption pairs, and an additional 14 million images—plus, the chaos of the open internet.
From Rest of World:
The DALL·E mini model was developed on three major datasets: Conceptual Captions dataset, which contains 3 million image and caption pairs; Conceptual 12M, which contains 12 million image and caption pairs, and The OpenAI’s corpus of about 15 million images. Dayma and DALL·E mini co-creator Pedro Cuenca noted that their model was also trained using unfiltered data on the internet, which opens it up for unknown and unexplainable biases in datasets that can trickle down to image generation models.
And this underlying data almost certainly has something to do with the sari phenomenon. The sari state of affairs, if you will.
Dayma suggested that images of South Asian women in saris may have been heavily represented in those original photosets that feed DALL-E Mini. And that the quirk could also have something to do with caption length, as the AI might associate zero-character prompts with short image descriptions.
However, Michael Cook, an AI researcher at Queen Mary University in London, told Rest of World he wasn’t so sure about the overrepresentation theory. “Typically machine-learning systems have the inverse problem — they actually don’t include enough photos of non-white people,” he said.
Instead, Cook thinks the origin could lie in a language bias of the data filtering process. “One thing that did occur to me while reading around is that a lot of these datasets strip out text that isn’t English,” he said. Image captions that include Hindi, for example, might be getting removed, leaving images with no supporting, explanatory text or labels floating free in the primordial AI soup, he explained.
So far, neither Cook’s nor Dayma’s ideas have been proven, but both are good examples of the type of problems very common in AI. Programmed and trained by humans, artificial intelligence is only as fool-proof as its creators. If you feed an image generator a cookie, it’s going to spit out a bunch of cookies. And because we live in hell, AI carries the unfortunate burden of human prejudices and stereotypes along with it.
As fun as it might be to think that the “woman in sari” image is some sort of primal message from the depths of the unfettered internet, the reality is that it’s likely the byproduct of a data fluke or plain old bias. The woman in the sari is a mystery, but the existing problems of AI aren’t.