Examples

To see an example of using the Chat Completions API, run the "Call Gemini with the OpenAI Library" notebook in one of the following environments:

Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub

Call Gemini with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\-d'{    "model": "google/${MODEL_ID}",    "messages": [{      "role": "user",      "content": "Write a story about a magic backpack."    }]  }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":"Why is the sky blue?"}],)print(response)

The following sample shows you how to send streaming requests to aGemini model by using the Chat Completions API:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\-d'{    "model": "google/${MODEL_ID}",    "stream": true,    "messages": [{      "role": "user",      "content": "Write a story about a magic backpack."    }]  }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":"Why is the sky blue?"}],stream=True,)forchunkinresponse:print(chunk)

Send a prompt and an image to the Gemini API in Vertex AI

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":[{"type":"text","text":"Describe the following image:"},{"type":"image_url","image_url":"gs://cloud-samples-data/generative-ai/image/scones.jpg",},],}],)print(response)

Call a self-deployed model with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\-d'{    "messages": [{      "role": "user",      "content": "Write a story about a magic backpack."    }]  }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# model_id = "gemma-2-9b-it"# endpoint_id = "YOUR_ENDPOINT_ID"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",api_key=credentials.token,)response=client.chat.completions.create(model=model_id,messages=[{"role":"user","content":"Why is the sky blue?"}],)print(response)

The following sample shows you how to send streaming requests to aself-deployed model by using the Chat Completions API:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\-d'{      "stream": true,      "messages": [{        "role": "user",        "content": "Write a story about a magic backpack."      }]    }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# model_id = "gemma-2-9b-it"# endpoint_id = "YOUR_ENDPOINT_ID"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",api_key=credentials.token,)response=client.chat.completions.create(model=model_id,messages=[{"role":"user","content":"Why is the sky blue?"}],stream=True,)forchunkinresponse:print(chunk)

`extra_body` examples

You can use either the SDK or the REST API to pass inextra_body.

Add`thought_tag_marker`

{...,"extra_body":{"google":{...,"thought_tag_marker":"..."}}}

Add`extra_body` using the SDK

client.chat.completions.create(...,extra_body={'extra_body':{'google':{...}}},)

`extra_content` examples

You can populate this field by using the REST API directly.

`extra_content` with string`content`

{"messages":[{"role":"...","content":"...","extra_content":{"google":{...}}}]}

Per-message`extra_content`

{"messages":[{"role":"...","content":[{"type":"...",...,"extra_content":{"google":{...}}}]}}

Per-tool call`extra_content`

{"messages":[{"role":"...","tool_calls":[{...,"extra_content":{"google":{...}}}]}]}

Sample`curl` requests

You can use thesecurl requests directly, rather than going through the SDK.

Use`thinking_config` with`extra_body`

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.5-flash-preview-04-17", \    "messages": [ \      { "role": "user", \      "content": [ \        { "type": "text", \          "text": "Are there any primes number of the form n*ceil(log(n))" \        }] }], \    "extra_body": { \      "google": { \          "thinking_config": { \          "include_thoughts": true, "thinking_budget": 10000 \        }, \        "thought_tag_marker": "think" } }, \    "stream": true }'

Multimodal requests

The Chat Completions API supports a variety of multimodal input, including bothaudio and video.

Use`image_url` to pass in image data

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.0-flash-001", \    "messages": [{ "role": "user", "content": [ \      { "type": "text", "text": "Describe this image" }, \      { "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'

Use`input_audio` to pass in audio data

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.0-flash-001", \    "messages": [ \      { "role": "user", \        "content": [ \          { "type": "text", "text": "Describe this: " }, \          { "type": "input_audio", "input_audio": { \            "format": "audio/mp3", \            "data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'

Structured output

You can use theresponse_format parameter to get structured output.

Example using SDK

frompydanticimportBaseModelfromopenaiimportOpenAIclient=OpenAI()classCalendarEvent(BaseModel):name:strdate:strparticipants:list[str]completion=client.beta.chat.completions.parse(model="google/gemini-2.5-flash-preview-04-17",messages=[{"role":"system","content":"Extract the event information."},{"role":"user","content":"Alice and Bob are going to a science fair on Friday."},],response_format=CalendarEvent,)print(completion.choices[0].message.parsed)

Using the global endpoint in OpenAI compatible mode

The following sample shows how to use the global endpoint in OpenAI compatible mode:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.0-flash-001", \    "messages": [ \    {"role": "user", \      "content": "Hello World" \      }] \      }'

What's next

See examples of calling theInference APIwith the OpenAI-compatible syntax.
See examples of calling theFunction Calling APIwith OpenAI-compatible syntax.
Learn more about theGemini API.
Learn more aboutmigrating from Azure OpenAI to the Gemini API.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Examples Stay organized with collections Save and categorize content based on your preferences.

Call Gemini with the Chat Completions API

REST

Python

REST

Python

Send a prompt and an image to the Gemini API in Vertex AI

Python

Call a self-deployed model with the Chat Completions API

REST

Python

REST

Python

extra_body examples

Addthought_tag_marker

Addextra_body using the SDK

extra_content examples

extra_content with stringcontent

Per-messageextra_content

Per-tool callextra_content

Samplecurl requests

Usethinking_config withextra_body

Multimodal requests

Useimage_url to pass in image data

Useinput_audio to pass in audio data

Structured output

Example using SDK

Using the global endpoint in OpenAI compatible mode

REST

What's next

Examples

`extra_body` examples

Add`thought_tag_marker`

Add`extra_body` using the SDK

`extra_content` examples

`extra_content` with string`content`

Per-message`extra_content`

Per-tool call`extra_content`

Sample`curl` requests

Use`thinking_config` with`extra_body`

Use`image_url` to pass in image data

Use`input_audio` to pass in audio data