OpenAI compatibility

Gemini models are accessible using the OpenAI libraries (Python and TypeScript /Javascript) along with the REST API. Only Google Cloud Auth is supported usingthe OpenAI library in Vertex AI. If you aren't already using the OpenAIlibraries, we recommend that you call theGemini API directly.

Python

importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain to me how AI works"}])print(response.choices[0].message)

What changed?

api_key=credentials.token: To use Google Cloud authentication, get aGoogle Cloud auth token using the sample code.
base_url: This tells the OpenAI library to send requests to Google Cloudinstead of the default URL.
model="google/gemini-2.0-flash-001": Choose a compatible Geminimodel out of the models that Vertex hosts.

Thinking

Gemini 2.5 models are trained to think through complex problems,leading to significantly improved reasoning. The Gemini API comes with a"thinking budget" parameterwhich gives fine-grained control over how much the model will think.

Unlike the Gemini API, the OpenAI API offers three levels of thinking control:"low", "medium", and "high", which are mapped behind the scenes to 1K, 8K, and24K thinking token budgets.

Not specifying a reasoning effort at all is equivalent to not specifying a thinking budget.

For more direct control of thinking budgets and other thinking-related configs from the OpenAI-compatible API, utilizeextra_body.google.thinking_config.

Python

importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# # Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)response=client.chat.completions.create(model="google/gemini-2.5-flash",reasoning_effort="low",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Explain to me how AI works"}])print(response.choices[0].message)

Streaming

The Gemini API supportsstreaming responses.

Python

importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"credentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())client=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)response=client.chat.completions.create(model="google/gemini-2.0-flash",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}],stream=True)forchunkinresponse:print(chunk.choices[0].delta)

Function calling

Function calling makes it easier to get structured data outputs from generativemodels and issupported in the Gemini API.

Python

importopenaifromgoogle.authimportdefaultimportgoogle.auth.transport.requests# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"credentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())client=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token)tools=[{"type":"function","function":{"name":"get_weather","description":"Get the weather in a given location","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and state, e.g. Chicago, IL",},"unit":{"type":"string","enum":["celsius","fahrenheit"]},},"required":["location"],},}}]messages=[{"role":"user","content":"What's the weather like in Chicago today?"}]response=client.chat.completions.create(model="google/gemini-2.0-flash",messages=messages,tools=tools,tool_choice="auto")print(response)

Image understanding

Gemini models are natively multimodal and provide best in classperformance onmany common vision tasks.

Python

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportbase64fromopenaiimportOpenAI# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)# Function to encode the imagedefencode_image(image_path):withopen(image_path,"rb")asimage_file:returnbase64.b64encode(image_file.read()).decode('utf-8')# Getting the base64 string# base64_image = encode_image("Path/to/image.jpeg")response=client.chat.completions.create(model="google/gemini-2.0-flash",messages=[{"role":"user","content":[{"type":"text","text":"What is in this image?",},{"type":"image_url","image_url":{"url":f"data:image/jpeg;base64,{base64_image}"},},],}],)print(response.choices[0])

Audio understanding

Analyze audio input:

Python

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportbase64fromopenaiimportOpenAI# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)withopen("/path/to/your/audio/file.wav","rb")asaudio_file:base64_audio=base64.b64encode(audio_file.read()).decode('utf-8')response=client.chat.completions.create(model="gemini-2.0-flash",messages=[{"role":"user","content":[{"type":"text","text":"Transcribe this audio",},{"type":"input_audio","input_audio":{"data":base64_audio,"format":"wav"}}],}],)print(response.choices[0].message.content)

Structured output

Gemini models can output JSON objects in anystructure you define.

Python

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsfrompydanticimportBaseModelfromopenaiimportOpenAI# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "global"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)classCalendarEvent(BaseModel):name:strdate:strparticipants:list[str]completion=client.beta.chat.completions.parse(model="google/gemini-2.0-flash",messages=[{"role":"system","content":"Extract the event information."},{"role":"user","content":"John and Susan are going to an AI conference on Friday."},],response_format=CalendarEvent,)print(completion.choices[0].message.parsed)

Current limitations

Access tokens live for 1 hour by default. After expiration, they must berefreshed. Seethis code examplefor more information.

What's next

Unlock Gemini's potential using theGoogle Gen AI Libraries.
See more examples using theChat Completions APIwith the OpenAI-compatible syntax.
See which Gemini models and parameters are supportedin the Overview page.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

OpenAI compatibility Stay organized with collections Save and categorize content based on your preferences.

Python

Thinking

Python

Streaming

Python

Function calling

Python

Image understanding

Python

Audio understanding

Python

Structured output

Python

Current limitations

What's next

OpenAI compatibility