| Imagen | |
|---|---|
An image generated with Imagen 4. Partial prompt: Softly illuminated afternoon valley with meandering river | |
| Developer | Google DeepMind |
| Initial release | May 2022; 3 years ago (2022-05) |
| Stable release | Imagen 4 / 20 May 2025; 6 months ago (2025-05-20) |
| Type | Text-to-image model |
| Website | Imagen website |
| Part ofa series on |
| Artificial intelligence (AI) |
|---|
Glossary |
Imagen is a series oftext-to-image models developed byGoogle DeepMind. They were developed byGoogle Brain until the company's merger with DeepMind in April 2023.[1] Imagen is primarily used to generate images from text prompts, similar toStability AI'sStable Diffusion,OpenAI'sDALL-E, orMidjourney.
The original version of the model was first discussed in a paper from May 2022.[2] The tool produces high-quality images and is available to all users with a Google account through services includingGemini, ImageFX, and Vertex AI.[3]
Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language.[2] The second version, Imagen 2 was released in December 2023.[4] The standout feature was text and logo generation.[5] Imagen 3 was released in August 2024.[6] Google claims that the newest version provides better detail and lighting on generated images.[7] On 20 May 2025 atGoogle I/O 2025 the company released an improved model, Imagen 4.[8]
Imagen uses two key technologies. The first is the use oftransformer-basedlarge language models, notablyT5, to understand text and subsequently encode text for image synthesis. The second is the use of cascadeddiffusion models providing high-fidelity image generation. Imagen generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.[2] Imagen 4 generates image up to 2k.[9]
Imagen can generate photorealistic images from text prompts.[3] It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. Like most text-to-image generative AI models, Imagen has difficulty rendering human fingers, text, ambigrams and other forms of typography.
The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.[7]