OpenAI compatibility Stay organized with collections Save and categorize content based on your preferences.
Gemini models are accessible using the OpenAI libraries (Python and TypeScript /Javascript) along with the REST API. Only Google Cloud Auth is supported usingthe OpenAI library in Vertex AI. If you aren't already using the OpenAIlibraries, we recommend that you call theGemini API directly.
Python
importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain to me how AI works"}])print(response.choices[0].message)What changed?
api_key=credentials.token: To use Google Cloud authentication, get aGoogle Cloud auth token using the sample code.base_url: This tells the OpenAI library to send requests to Google Cloudinstead of the default URL.model="google/gemini-2.0-flash-001": Choose a compatible Geminimodel out of the models that Vertex hosts.
Thinking
Gemini 2.5 models are trained to think through complex problems,leading to significantly improved reasoning. The Gemini API comes with a"thinking budget" parameterwhich gives fine-grained control over how much the model will think.
Unlike the Gemini API, the OpenAI API offers three levels of thinking control:"low", "medium", and "high", which are mapped behind the scenes to 1K, 8K, and24K thinking token budgets.
Not specifying a reasoning effort at all is equivalent to not specifying a thinking budget.
For more direct control of thinking budgets and other thinking-related configs from the OpenAI-compatible API, utilizeextra_body.google.thinking_config.
Python
importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# # Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)response=client.chat.completions.create(model="google/gemini-2.5-flash",reasoning_effort="low",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain to me how AI works"}])print(response.choices[0].message)Streaming
The Gemini API supportsstreaming responses.
Python
importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"credentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())client=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)response=client.chat.completions.create(model="google/gemini-2.0-flash",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}],stream=True)forchunkinresponse:print(chunk.choices[0].delta)Function calling
Function calling makes it easier to get structured data outputs from generativemodels and issupported in the Gemini API.
Python
importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"credentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())client=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)tools=[{"type":"function","function":{"name":"get_weather","description":"Get the weather in a given location","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and state, e.g. Chicago, IL",},"unit":{"type":"string","enum":["celsius","fahrenheit"]},},"required":["location"],},}}]messages=[{"role":"user","content":"What's the weather like in Chicago today?"}]response=client.chat.completions.create(model="google/gemini-2.0-flash",messages=messages,tools=tools,tool_choice="auto")print(response)Image understanding
Gemini models are natively multimodal and provide best in classperformance onmany common vision tasks.
Python
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportbase64fromopenaiimportOpenAI# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)# Function to encode the imagedefencode_image(image_path):withopen(image_path,"rb")asimage_file:returnbase64.b64encode(image_file.read()).decode('utf-8')# Getting the base64 string# base64_image = encode_image("Path/to/image.jpeg")response=client.chat.completions.create(model="google/gemini-2.0-flash",messages=[{"role":"user","content":[{"type":"text","text":"What is in this image?",},{"type":"image_url","image_url":{"url":f"data:image/jpeg;base64,{base64_image}"},},],}],)print(response.choices[0])Audio understanding
Analyze audio input:
Python
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportbase64fromopenaiimportOpenAI# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)withopen("/path/to/your/audio/file.wav","rb")asaudio_file:base64_audio=base64.b64encode(audio_file.read()).decode('utf-8')response=client.chat.completions.create(model="gemini-2.0-flash",messages=[{"role":"user","content":[{"type":"text","text":"Transcribe this audio",},{"type":"input_audio","input_audio":{"data":base64_audio,"format":"wav"}}],}],)print(response.choices[0].message.content)Structured output
Gemini models can output JSON objects in anystructure you define.
Python
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsfrompydanticimportBaseModelfromopenaiimportOpenAI# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)classCalendarEvent(BaseModel):name:strdate:strparticipants:list[str]completion=client.beta.chat.completions.parse(model="google/gemini-2.0-flash",messages=[{"role":"system","content":"Extract the event information."},{"role":"user","content":"John and Susan are going to an AI conference on Friday."},],response_format=CalendarEvent,)print(completion.choices[0].message.parsed)Current limitations
- Access tokens live for 1 hour by default. After expiration, they must berefreshed. Seethis code examplefor more information.
What's next
Unlock Gemini's potential using theGoogle Gen AI Libraries.
See more examples using theChat Completions APIwith the OpenAI-compatible syntax.
See which Gemini models and parameters are supportedin the Overview page.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.