| Gemini | |
|---|---|
| Developers | Google AI Google DeepMind |
| Initial release | December 6, 2023; 23 months ago (2023-12-06) (beta version) February 8, 2024; 21 months ago (2024-02-08) (official rollout) |
| Stable release | 3.0 Pro / November 18, 2025; 6 days ago (2025-11-18) |
| Predecessor | PaLM |
| Available in | English and other langugages |
| Type | Large language model |
| License | Proprietary |
| Website | deepmind |
Gemini is a family ofmultimodallarge language models (LLMs) developed byGoogle DeepMind, and the successor toLaMDA andPaLM 2. Comprising Gemini Pro, Gemini Flash, and Gemini Lite, it was announced on December 6, 2023. It powers thechatbotof the same name.
Google announced Gemini, alarge language model (LLM) developed by subsidiaryGoogle DeepMind, during theGoogle I/O keynote on May 10, 2023. It was positioned as a more powerful successor toPaLM 2, which was also unveiled at the event, with Google CEOSundar Pichai stating that Gemini was still in its early developmental stages.[1][2] Unlike other LLMs, Gemini was said to be unique in that it was not trained on atext corpus alone and was designed to bemultimodal, meaning it could process multiple types of data simultaneously, including text, images, audio, video, andcomputer code.[3] It had been developed as a collaboration between DeepMind andGoogle Brain, two branches of Google that had been merged as Google DeepMind the previous month.[4] In an interview withWired, DeepMind CEODemis Hassabis touted Gemini's advanced capabilities, which he believed would allow the algorithm to trumpOpenAI'sChatGPT, which runs onGPT-4 and whose growing popularity had been aggressively challenged by Google withLaMDA andBard. Hassabis highlighted the strengths of DeepMind'sAlphaGo program, which gained worldwide attention in 2016 when it defeatedGo championLee Sedol, saying that Gemini would combine the power of AlphaGo and other Google–DeepMind LLMs.[5]
In August 2023,The Information published a report outlining Google's roadmap for Gemini, revealing that the company was targeting a launch date of late 2023. According to the report, Google hoped to surpass OpenAI and other competitors by combining conversational text capabilities present in most LLMs withartificial intelligence–powered image generation, allowing it to create contextual images and be adapted for a wider range ofuse cases.[6] Like Bard,[7] Google co-founderSergey Brin was summoned out of retirement to assist in the development of Gemini, along with hundreds of other engineers from Google Brain and DeepMind;[6][8] he was later credited as a "core contributor" to Gemini.[9] Because Gemini was being trained on transcripts ofYouTube videos, lawyers were brought in to filter out any potentially copyrighted materials.[6]
With news of Gemini's impending launch, OpenAI hastened its work on integrating GPT-4 with multimodal features similar to those of Gemini.[10]The Information reported in September that several companies had been granted early access to "an early version" of the LLM, which Google intended to make available to clients throughGoogle Cloud's Vertex AI service. The publication also stated that Google was arming Gemini to compete with both GPT-4 andMicrosoft'sGitHub Copilot.[11][12]
On December 6, 2023, Pichai and Hassabis announced "Gemini 1.0" at a virtual press conference.[13][14] It comprised three models: Gemini Ultra, designed for "highly complex tasks"; Gemini Pro, designed for "a wide range of tasks"; and Gemini Nano, designed for "on-device tasks". At launch, Gemini Pro and Nano were integrated into Bard and thePixel 8 Pro smartphone, respectively, while Gemini Ultra was set to power "Bard Advanced" and become available to software developers in early 2024. Other products that Google intended to incorporate Gemini into includedSearch,Ads,Chrome, Duet AI onGoogle Workspace, andAlphaCode 2.[15][14] It was made available only in English.[14][16] Touted as Google's "largest and most capable AI model" and designed to emulate human behavior,[17][14][18] the company stated that Gemini would not be made widely available until the following year due to the need for "extensive safety testing".[13] Gemini was trained on and powered by Google'sTensor Processing Units (TPUs),[13][16] and the name is in reference to the DeepMind–Google Brain merger as well asNASA'sProject Gemini.[19]
Gemini Ultra was said to have outperformed GPT-4,Anthropic'sClaude 2,Inflection AI's Inflection-2,Meta'sLLaMA 2, andxAI'sGrok 1 on a variety of industry benchmarks,[20][13] while Gemini Pro was said to have outperformedGPT-3.5.[3] Gemini Ultra was also the first language model to outperform human experts on the 57-subjectMassive Multitask Language Understanding (MMLU) test, obtaining a score of 90%.[3][19] Gemini Pro was made available to Google Cloud customers on AI Studio and Vertex AI on December 13, while Gemini Nano will be made available toAndroid developers as well.[21][22][23] Hassabis further revealed that DeepMind was exploring how Gemini could be "combined with robotics to physically interact with the world".[24] In accordance withan executive order signed by U.S. PresidentJoe Biden in October, Google stated that it would share testing results of Gemini Ultra with thefederal government of the United States. Similarly, the company was engaged in discussions with thegovernment of the United Kingdom to comply with the principles laid out at theAI Safety Summit atBletchley Park in November.[3]
In June, 2025 Google introduced Gemini CLI, an open-source AI agent that brings the capabilities of Gemini directly to the terminal, offering advanced coding, automation, and problem-solving features with generous free usage limits for individual developers.[25]
Google partnered withSamsung to integrate Gemini Nano and Gemini Pro into itsGalaxy S24 smartphone lineup in January 2024.[26][27] The following month, Bard and Duet AI were unified under the Gemini brand,[28][29] with "Gemini Advanced with Ultra 1.0" debuting via a new "AI Premium" tier of theGoogle One subscription service.[30] Gemini Pro also received a global launch.[31]
In February, 2024, Google launched Gemini 1.5 in a limited capacity, positioned as a more powerful and capable model than 1.0 Ultra.[32][33][34] This "step change" was achieved through various technical advancements, including a new architecture, amixture-of-experts approach, and a larger one-million-tokencontext window, which equates to roughly an hour of silent video, 11 hours of audio, 30,000 lines of code, or 700,000 words.[35] The same month, Google debuted Gemma, a family offree and open-source LLMs that serve as a lightweight version of Gemini. They came in two sizes, with a neural network with two and seven billion parameters, respectively. Multiple publications viewed this as a response to Meta and others open-sourcing their AI models, and a stark reversal from Google's longstanding practice of keeping its AI proprietary.[36][37][38] Google announced an additional model, Gemini 1.5 Flash, on May 14 at the 2024 I/O keynote.[39]
Two updated Gemini models, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, were released on September 24, 2024.[40]
On December 11, 2024, Google announced Gemini 2.0 Flash Experimental,[41] a significant update to its Gemini AI model. This iteration boasts improved speed and performance over its predecessor, Gemini 1.5 Flash. Key features include a Multimodal Live API for real-time audio and video interactions, enhanced spatial understanding, native image and controllable text-to-speech generation (with watermarking), and integrated tool use, including Google Search.[42] It also introduces improved agentic capabilities, a new Google Gen AI SDK,[43] and "Jules," an experimental AI coding agent for GitHub. Additionally, Google Colab is integrating Gemini 2.0 to generate data science notebooks from natural language. Gemini 2.0 was available through the Gemini chat interface for all users as "Gemini 2.0 Flash experimental".
On January 30, 2025, Google released Gemini 2.0 Flash as the new default model, with Gemini 1.5 Flash still available for usage. This was followed by the release of Gemini 2.0 Pro on February 5, 2025. Additionally, Google released Gemini 2.0 Flash Thinking Experimental, which details the language model's thinking process when responding to prompts.[44]
On March 12, 2025, Google also announcedGemini Robotics, avision-language-action model based on the Gemini 2.0 family of models.[45]
The next day, Google announced that Gemini inAndroid Studio would be able to understand simple UImockups and transform them into workingJetpack Compose code.[46]
Gemini 2.5 Pro Experimental was released on March 25, 2025, described by Google as its most intelligent AI model yet, featuring enhanced reasoning and coding capabilities,[47][48][49] and a "thinking model" capable of reasoning through steps before responding, using techniques likechain-of-thought prompting,[47][49][50] whilst maintaining nativemultimodality and launching with a 1 million token context window.[47][49]
At Google I/O 2025, Google announced significant updates to its Gemini core models.[51][52] Gemini 2.5 Flash became the default model, delivering faster responses.[51][52] Gemini 2.5 Pro was introduced as the most advanced Gemini model, featuring reasoning, coding capabilities, and the new Deep Think mode for complex tasks.[53] Both 2.5 Pro and Flash support native audio output and improved security.
On June 17, 2025, Google announced general availability for 2.5 Pro and Flash. They also introduced Gemini 2.5 Flash-Lite that same day, a model optimized for speed and cost-efficiency.[54]
On November 18th, 2025, Google announced the release of 3.0 Pro and 3.0 Deep Think.[55] These new models replace 2.5 Pro and Flash, and are the most powerful models available as of November 2025. On release 3.0 Pro outperformed major AI models in 19 out of 20 benchmarks tested, including surpassing OpenAI'sGPT-5 Pro inHumanity's Last Exam, with an accuracy of 41% compared to OpenAI's 31.64%.[56]
The following table lists the main model versions of Gemini, describing the significant changes included with each version:[57][58]
| Version | Release date | Status[59][54] | Description |
|---|---|---|---|
| Bard | 21 March 2023 | Discontinued | Google's first experimental chatbot service based on LaMDA.[60] |
| 1.0 Nano | 6 December 2023 | Discontinued | Designed for on-device tasks and first available in Google'sPixel 8 Pro.[61] |
| 1.0 Pro | 13 December 2023 | Discontinued | Designed for a diverse range of tasks.[61] |
| 1.0 Ultra | 8 February 2024 | Discontinued | Google's most powerful offering in the Gemini 1.0 family.[61] |
| 1.5 Pro | 15 February 2024 | Discontinued | As a successor to the 1.0 series of models, 1.5 Pro offers significantly increased context size (up to 1 million tokens). It is designed to be the most capable model in the Gemini 1.5 family.[62] |
| 1.5 Flash | 14 May 2024 | Discontinued | This version got renamed from 'Nano' to 'Flash'. It is also Gemini's free model. |
| 2.0 Flash | 30 January 2025 | Active | Developed by Google with a focus on multimodality, agentic capabilities, and speed.[63] |
| 2.0 Flash-Lite | 1 February 2025 | Active | First-ever Gemini Flash-Lite model designed for cost-efficiency and speed.[64] |
| 2.5 Pro | 25 March 2025 | Active | |
| 2.5 Flash | 17 April 2025 | Active | An incremental improvement from Gemini 2.5. |
| 2.5 Flash-Lite | 17 June 2025 | Active | |
| 2.5 Flash Image (Nano Banana) | 26 August 2025 | Active | |
| 3.0 Pro | 18 November 2025 | Active | The most powerful AI model as of November 2025. On release it outperformed every major AI model in 19 of the 20benchmarks Google tested it with, and currenly tops the LMArena leaderboard.[65][66] |
| 3.0 Pro Image (Nano Banana Pro) | 20 November 2025 | Active | An improved version of Nano Banana which includes better text rendering and better real world knowledge.[67] |
Nano Banana (officially Gemini 2.5 Flash Image) is animage generation andediting model powered bygenerative artificial intelligence and developed byGoogle DeepMind, a subsidiary ofGoogle. Atext-to-image variant of the Gemini family oflarge language models, it was launched in August 2025 as a feature within theGemini chatbot and otherGoogle products. "Nano Banana" was the codename used for the model while it was undergoing secret public testing onLMArena. Upon release, it became aviral Internet sensation on social media, particularly for itsphotorealistic "3D figurine" images. It first appeared publicly as an anonymous model on the crowd-sourced AI evaluation platformLMArena in early August 2025. It was released publicly on August 26, 2025 through the Gemini app and relatedGoogle AI services. The nickname "Nano Banana" originated from internalplaceholder naming but caught on quickly with the internet[68] when the codename started being used in online groups. Google later confirmed its identity asGemini 2.5 Flash Image in an official announcement upon public release.[69][70] The model lets users change hairstyles, change backdrops, and mix photos, using natural language cues. Subject consistency allows the same person or item to be recognized across revisions.
Multi-image fusion joins photographs into one seamless output, and world knowledge allows context-aware changes. It also providesSynthID watermarking, which is an invisible digital signature in outputs to identify AI-generated information.[70][71]
Following its release, Nano Banana was made available in the Gemini app, Google AI Studio, and through Vertex AI. According to Google, it helped attract over 10 million new users to the Gemini app and facilitated more than 200 million image edits within weeks of launch.[72][73] People started to connect Nano Banana with a viral craze in which people turned their selfies into 3D figurines that looked like toys. The event circulated quickly on sites likeInstagram andX (previously Twitter).[74][75] By adding the model to X, users could tag Nano Banana directly in posts to make photos from prompts, which made it even more popular.[74]
A September 2025 review inTechRadar reported that Nano Banana was more realistic and consistent across multiple prompts thanChatGPT's image generation.[76] A review inTom's Guide praised its ability to handle creative and lively image edits.[77]
Another review inPC Gamer mentioned that the model did not have some basic editing tools likecropping, and that the product sometimes did not apply changes, but reverted back to the original image instead.[71]
Nano Banana showed good performance inarchitectural visualization, for producing imagery at the correct scale even with complex geometry.[78]
As Gemini is multimodal, each context window can contain multiple forms of input. The different modes can be interleaved and do not have to be presented in a fixed order, allowing for a multimodal conversation. For example, the user might open the conversation with a mix of text, picture, video, and audio, presented in any order, and Gemini might reply with the same free ordering. Input images may be of differentresolutions, while video is inputted as a sequence of images. Audio is sampled at 16kHz and then converted into a sequence of tokens by the Universal Speech Model. Gemini's dataset is multimodal and multilingual, consisting of "web documents, books, and code, and includ[ing] image, audio, and video data".[79]
Gemini and Gemma models are decoder-onlytransformers, with modifications to allow efficient training and inference on TPUs. The 1.0 generation usesmulti-query attention.[79]
| Generation | Variant | Release date | Parameters | Context length | Notes |
|---|---|---|---|---|---|
| 1.0 | Nano-1 | 6 December 2023 | 1.8B | 32,768 | Distilled from "larger Gemini models", 4-bit quantized[79] |
| Nano-2 | 6 December 2023 | 3.25B | |||
| Pro | 13 December 2023 | ? | |||
| Ultra | 8 February 2024 | ? | |||
| 1.5 | Pro | 15 February 2024 | ? | 10,000,000[80][81] | 1 million tokens in production API |
| Mini | 14 May 2024 |
No whitepapers were published for Gemini 2.0, 2.5, and 3.0.
Gemini's launch was preceded by months of intense speculation and anticipation, whichMIT Technology Review described as "peak AI hype".[82][20] In August 2023, Dylan Patel and Daniel Nishball of research firm SemiAnalysis penned ablog post declaring that the release of Gemini would "eat the world" and outclass GPT-4, prompting OpenAI CEOSam Altman to ridicule the duo onX (formerly Twitter).[83][84] Business magnateElon Musk, who co-founded OpenAI, weighed in, asking, "Are the numbers wrong?"[85] Hugh Langley ofBusiness Insider remarked that Gemini would be a make-or-break moment for Google, writing: "If Gemini dazzles, it will help Google change the narrative that it was blindsided by Microsoft and OpenAI. If it disappoints, it will embolden critics who say Google has fallen behind."[86]
Reacting to its unveiling in December 2023,University of Washington professor emeritusOren Etzioni predicted a "tit-for-tatarms race" between Google andOpenAI. ProfessorAlexei Efros of theUniversity of California, Berkeley praised the potential of Gemini's multimodal approach,[19] while scientistMelanie Mitchell of theSanta Fe Institute called Gemini "very sophisticated". Professor Chirag Shah of the University of Washington was less impressed, likening Gemini's launch to the routineness ofApple'sannual introduction of a newiPhone. Similarly,Stanford University's Percy Liang, the University of Washington'sEmily Bender, and theUniversity of Galway's Michael Madden cautioned that it was difficult to interpret benchmark scores without insight into the training data used.[82][87] Writing forFast Company, Mark Sullivan opined that Google had the opportunity to challenge the iPhone's dominant market share, believing that Apple was unlikely to have the capacity to develop functionality similar to Gemini with itsSirivirtual assistant.[88] Google shares spiked by 5.3 percent the day after Gemini's launch.[89][90]
Google faced criticism for a demonstrative video of Gemini, which was not conducted in real time.[91]
Gemini 2.5 Pro Experimental debuted at the top position on theLMArena leaderboard, a benchmark measuring human preference, indicating strong performance and output quality.[47][49] The model achieved state-of-the-art or highly competitive results across various benchmarks evaluating reasoning, knowledge, science, math, coding, and long-context performance, such asHumanity's Last Exam, GPQA, AIME 2025, SWE-bench and MRCR.[47][92][49][48] Initial reviews highlighted its improved reasoning capabilities and performance gains compared to previous versions.[48][50] Published benchmarks also showed areas where contemporary models from competitors likeAnthropic,xAI, orOpenAI held advantages.[92][49]
In our research, we've also successfully tested up to 10 million tokens.