Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Imagen (text-to-image model)

From Wikipedia, the free encyclopedia
Image-generating machine learning model
Imagen
An image generated with Imagen 4. Partial prompt:Softly illuminated afternoon valley with meandering river
DeveloperGoogle DeepMind
Initial releaseMay 2022; 3 years ago (2022-05)
Stable release
Imagen 4 / 20 May 2025; 6 months ago (2025-05-20)
TypeText-to-image model
WebsiteImagen website
Part ofa series on
Artificial intelligence (AI)
Glossary

Imagen is a series oftext-to-image models developed byGoogle DeepMind. They were developed byGoogle Brain until the company's merger with DeepMind in April 2023.[1] Imagen is primarily used to generate images from text prompts, similar toStability AI'sStable Diffusion,OpenAI'sDALL-E, orMidjourney.

The original version of the model was first discussed in a paper from May 2022.[2] The tool produces high-quality images and is available to all users with a Google account through services includingGemini, ImageFX, and Vertex AI.[3]

History

[edit]

Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language.[2] The second version, Imagen 2 was released in December 2023.[4] The standout feature was text and logo generation.[5] Imagen 3 was released in August 2024.[6] Google claims that the newest version provides better detail and lighting on generated images.[7] On 20 May 2025 atGoogle I/O 2025 the company released an improved model, Imagen 4.[8]

Technology

[edit]

Imagen uses two key technologies. The first is the use oftransformer-basedlarge language models, notablyT5, to understand text and subsequently encode text for image synthesis. The second is the use of cascadeddiffusion models providing high-fidelity image generation. Imagen generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.[2] Imagen 4 generates image up to 2k.[9]

Capabilities

[edit]

Imagen can generate photorealistic images from text prompts.[3] It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. Like most text-to-image generative AI models, Imagen has difficulty rendering human fingers, text, ambigrams and other forms of typography.

The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.[7]

See also

[edit]

References

[edit]
  1. ^Roth, Emma; Peters, Jay (April 20, 2023)."Google's big AI push will combine Brain and DeepMind into one team".The Verge.Archived from the original on April 20, 2023. RetrievedMarch 18, 2025.
  2. ^abcSaharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Seyed Kamyar Seyed Ghasemipour; Burcu Karagol Ayan; Sara Mahdavi, S.; Rapha Gontijo Lopes; Salimans, Tim; Ho, Jonathan; David J Fleet; Norouzi, Mohammad (2022). "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding".arXiv:2205.11487 [cs.CV].
  3. ^abPeterson, Jake (2024-08-16)."Anyone With a Google Account Can Try Google's Latest AI Image Generator Right Now".Lifehacker. Retrieved2025-03-18.
  4. ^"Imagen 2 - our most advanced text-to-image technology".Google DeepMind. 2025-03-12. Retrieved2025-03-18.
  5. ^Wiggers, Kyle (2023-12-13)."Google debuts Imagen 2 with text and logo generation".TechCrunch. Retrieved2025-03-18.
  6. ^Schoon, Ben (2024-08-16)."Google opens access to Imagen 3, its latest model for AI image generation".9to5Google.Archived from the original on 2024-08-18. Retrieved2025-03-18.
  7. ^abChristian Rowlands (2025-02-26)."Some of the most realistic AI images you'll see were created with this free tool".TechRadar. Retrieved2025-03-18.
  8. ^Kyle Wiggers (2025-05-20)."Imagen 4 is Google's newest AI image generator".techcrunch.com. Retrieved2025-03-18.
  9. ^"Imagen".Google DeepMind. Retrieved2025-10-28.

External links

[edit]
Computer
programs
AlphaGo
Versions
Competitions
In popular culture
Other
Machine
learning
Neural networks
Other
Generative
AI
Chatbots
Models
Other
See also
a subsidiary ofAlphabet
Company
Divisions
Subsidiaries
Active
Defunct
Programs
Events
Infrastructure
People
Current
Former
Criticism
General
Incidents
Other
Software
A–C
D–N
O–Z
Operating systems
Machine learning models
Neural networks
Computer programs
Formats and codecs
Programming languages
Search algorithms
Domain names
Typefaces
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
Hardware
Pixel
Smartphones
Smartwatches
Tablets
Laptops
Other
Nexus
Smartphones
Tablets
Other
Other
Advertising
Antitrust
Intellectual
property
Privacy
Other
Related
Concepts
Products
Android
Street View coverage
YouTube
Other
Documentaries
Books
Popular culture
Other
Concepts
Chatbots
Models
Text
Coding
Image
Video
Speech
Music
Controversies
Agents
Companies
Concepts
Applications
Implementations
Audio–visual
Text
Decisional
People
Architectures
Retrieved from "https://en.wikipedia.org/w/index.php?title=Imagen_(text-to-image_model)&oldid=1320755746"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp