Examples Stay organized with collections Save and categorize content based on your preferences.
To see an example of using the Chat Completions API, run the "Call Gemini with the OpenAI Library" notebook in one of the following environments:
Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub
Call Gemini with the Chat Completions API
The following sample shows you how to send non-streaming requests:
REST
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\-d'{ "model": "google/${MODEL_ID}", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":"Why is the sky blue?"}],)print(response)The following sample shows you how to send streaming requests to aGemini model by using the Chat Completions API:
REST
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\-d'{ "model": "google/${MODEL_ID}", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":"Why is the sky blue?"}],stream=True,)forchunkinresponse:print(chunk)Send a prompt and an image to the Gemini API in Vertex AI
Python
Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":[{"type":"text","text":"Describe the following image:"},{"type":"image_url","image_url":"gs://cloud-samples-data/generative-ai/image/scones.jpg",},],}],)print(response)Call a self-deployed model with the Chat Completions API
The following sample shows you how to send non-streaming requests:
REST
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\-d'{ "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# model_id = "gemma-2-9b-it"# endpoint_id = "YOUR_ENDPOINT_ID"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",api_key=credentials.token,)response=client.chat.completions.create(model=model_id,messages=[{"role":"user","content":"Why is the sky blue?"}],)print(response)The following sample shows you how to send streaming requests to aself-deployed model by using the Chat Completions API:
REST
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\-d'{ "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# model_id = "gemma-2-9b-it"# endpoint_id = "YOUR_ENDPOINT_ID"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",api_key=credentials.token,)response=client.chat.completions.create(model=model_id,messages=[{"role":"user","content":"Why is the sky blue?"}],stream=True,)forchunkinresponse:print(chunk)extra_body examples
You can use either the SDK or the REST API to pass inextra_body.
Addthought_tag_marker
{...,"extra_body":{"google":{...,"thought_tag_marker":"..."}}}Addextra_body using the SDK
client.chat.completions.create(...,extra_body={'extra_body':{'google':{...}}},)extra_content examples
You can populate this field by using the REST API directly.
extra_content with stringcontent
{"messages":[{"role":"...","content":"...","extra_content":{"google":{...}}}]}Per-messageextra_content
{"messages":[{"role":"...","content":[{"type":"...",...,"extra_content":{"google":{...}}}]}}Per-tool callextra_content
{"messages":[{"role":"...","tool_calls":[{...,"extra_content":{"google":{...}}}]}]}Samplecurl requests
You can use thesecurl requests directly, rather than going through the SDK.
Usethinking_config withextra_body
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \ "model": "google/gemini-2.5-flash-preview-04-17", \ "messages": [ \ { "role": "user", \ "content": [ \ { "type": "text", \ "text": "Are there any primes number of the form n*ceil(log(n))" \ }] }], \ "extra_body": { \ "google": { \ "thinking_config": { \ "include_thoughts": true, "thinking_budget": 10000 \ }, \ "thought_tag_marker": "think" } }, \ "stream": true }'Multimodal requests
The Chat Completions API supports a variety of multimodal input, including bothaudio and video.
Useimage_url to pass in image data
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \ "model": "google/gemini-2.0-flash-001", \ "messages": [{ "role": "user", "content": [ \ { "type": "text", "text": "Describe this image" }, \ { "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'Useinput_audio to pass in audio data
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \ "model": "google/gemini-2.0-flash-001", \ "messages": [ \ { "role": "user", \ "content": [ \ { "type": "text", "text": "Describe this: " }, \ { "type": "input_audio", "input_audio": { \ "format": "audio/mp3", \ "data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'Structured output
You can use theresponse_format parameter to get structured output.
Example using SDK
frompydanticimportBaseModelfromopenaiimportOpenAIclient=OpenAI()classCalendarEvent(BaseModel):name:strdate:strparticipants:list[str]completion=client.beta.chat.completions.parse(model="google/gemini-2.5-flash-preview-04-17",messages=[{"role":"system","content":"Extract the event information."},{"role":"user","content":"Alice and Bob are going to a science fair on Friday."},],response_format=CalendarEvent,)print(completion.choices[0].message.parsed)Using the global endpoint in OpenAI compatible mode
The following sample shows how to use the global endpoint in OpenAI compatible mode:
REST
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/openapi/chat/completions\-d'{ \ "model": "google/gemini-2.0-flash-001", \ "messages": [ \ {"role": "user", \ "content": "Hello World" \ }] \ }'
What's next
- See examples of calling theInference APIwith the OpenAI-compatible syntax.
- See examples of calling theFunction Calling APIwith OpenAI-compatible syntax.
- Learn more about theGemini API.
- Learn more aboutmigrating from Azure OpenAI to the Gemini API.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.
Open in Colab
Open in Colab Enterprise
Openin Vertex AI Workbench
View on GitHub