Why you can’t use Google’s impressive text-to-image generator Imagen yet

Why you can’t use Google’s impressive text-to-image generator Imagen yet thumbnail

This article originally appeared on Popular Photography .

Acute Corgi lives in a house of sushi. A dragon fruit in snow wearing a karate belt. A brain riding a rocket ship headed towards the moon. These are just a few examples of the AI-generated images created by Google’s Imagen text to image diffusion model. The results are extremely accurate, sometimes humorously so. Researchers from Google recently unveiled these results in a paper published last month–and discussed the moral repercussions that come with using this latest technology.

Google Imagen wins over the

competition

In their research paper, Google computer scientists found that existing large language models are quite capable of creating images from text input. With Imagen, they simply increased the language model size and found that it led to more accurate results.

Why you can’t use Google’s impressive text-to-image generator Imagen yet
Imagen’s FID score ranked well above other text-to-image synthesizers. Google Research, Brain Team

To measure results, Imagen employed the Common Objects in Context (COCO) dataset, which is an open-source compendium of visual datasets on which companies and researchers can train their AI algorithms in image recognition. Frechet Inception Distance score (FID) is calculated to determine how accurate the models are in rendering an image using prompts from data. A lower score means that there are more similarities between real and generated images. A perfect score is 0.0. Google’s Imagen diffusion model can create 1024-by-1024- pixel sample images with an FID score of 7.27.

According to the research paper, Imagen tops the charts with its FID score when compared to other models including DALL-E 2, VQ-GAN CLIP, and Latent Diffusion Models. Human raters preferred Imagen, according to the findings.

Why you can’t use Google’s impressive text-to-image generator Imagen yet
A dragon fruit wearing a karate belt is just one of the many images Imagen is capable of creating. Google Research, Brain Team

“For photorealism, Imagen achieves 39.2% preference rate indicating high image quality generation,” Google computer scientists report. “On the set with no people, there is a boost in the preference rate of Imagen to 43.6%, indicating Imagen’s limited ability to generate photorealistic people. Imagen’s score on caption similarity is equal to the original reference images. This suggests Imagen’s ability generate images that are consistent with COCO captions .”

.

In conjunction with the COCO dataset, the Google group also created their own, called DrawBench. The benchmark includes rigorous scenarios that test different models’ ability synthesize images based upon “compositionality and cardinality”, spatial relations, rare words and challenging prompts.” This goes beyond the limited COCO prompts.

Why you can’t use Google’s impressive text-to-image generator Imagen yet
Though fun, the technology presents moral and ethical dilemmas. Google Research, Brain Team

Moral implications of Imagen and other AI text-to-image software

There’s a reason that all the sample images don’t have people. The Imagen team discusses in their conclusion the possible moral repercussions of the technology and the societal impact. This is not always the best. The program already displays a Western bias. Although there is the potential for endless creativity, there are also those who may try to harm the software. Imagen is currently not available for public use. However, this could change.

“On one hand, generative methods could be leveraged for malicious reasons, including harassment and misinformation spreading and raise many concerns about social and cultural exclusion. These considerations informed our decision not to release code or to make a public demo. In future work, we will explore a framework for responsible externalization that balances the value of external auditing with the risks of unrestricted open-access.”

Why you can’t use Google’s impressive text-to-image generator Imagen yet
The researchers acknowledge that more work is required before Imagen can be responsibly released to the public. Google Research, Brain Team

Additionally the researchers found that Imagen exhibits bias due to the datasets it is trained from. “Dataset audits revealed that these datasets tend reflect social stereotypes, oppressive views, and derogatory or otherwise harmful associations to marginalized identities

While the technology is certainly entertaining (who wouldn’t love to see an alien octopus floating in a portal while they read a newspaper), it is not a practical solution. It’s clear that Imagen (and other programs), will require more research and work before they can be released to the public. Some, like Dall-E 2, have deployed safeguards, but the efficacy remains to be seen. Imagen recognizes the enormous, but necessary task of minimizing negative consequences.

” While we don’t directly address these issues in this work, an awareness about the limitations of our data guides our decision to not release Imagen for public usage,” they conclude. “We strongly discourage the use of text to image generation methods for any user-facing tool without paying close attention to the contents .”

.

Read More