Examples

Call Gemini with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\-d'{    "model": "google/${MODEL_ID}",    "messages": [{      "role": "user",      "content": "Write a story about a magic backpack."    }]  }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":"Why is the sky blue?"}],)print(response)

The following sample shows you how to send streaming requests to aGemini model by using the Chat Completions API:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/openapi/chat/completions\-d'{    "model": "google/${MODEL_ID}",    "stream": true,    "messages": [{      "role": "user",      "content": "Write a story about a magic backpack."    }]  }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":"Why is the sky blue?"}],stream=True,)forchunkinresponse:print(chunk)

Send a prompt and an image to the Gemini API in Vertex AI

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",api_key=credentials.token,)response=client.chat.completions.create(model="google/gemini-2.0-flash-001",messages=[{"role":"user","content":[{"type":"text","text":"Describe the following image:"},{"type":"image_url","image_url":"gs://cloud-samples-data/generative-ai/image/scones.jpg",},],}],)print(response)

Call a self-deployed model with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\-d'{    "messages": [{      "role": "user",      "content": "Write a story about a magic backpack."    }]  }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# model_id = "gemma-2-9b-it"# endpoint_id = "YOUR_ENDPOINT_ID"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",api_key=credentials.token,)response=client.chat.completions.create(model=model_id,messages=[{"role":"user","content":"Why is the sky blue?"}],)print(response)

The following sample shows you how to send streaming requests to aself-deployed model by using the Chat Completions API:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/${ENDPOINT}/chat/completions\-d'{      "stream": true,      "messages": [{        "role": "user",        "content": "Write a story about a magic backpack."      }]    }'

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.authimportdefaultimportgoogle.auth.transport.requestsimportopenai# TODO(developer): Update and un-comment below lines# project_id = "PROJECT_ID"# location = "us-central1"# model_id = "gemma-2-9b-it"# endpoint_id = "YOUR_ENDPOINT_ID"# Programmatically get an access tokencredentials,_=default(scopes=["https://www.googleapis.com/auth/cloud-platform"])credentials.refresh(google.auth.transport.requests.Request())# OpenAI Clientclient=openai.OpenAI(base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/{endpoint_id}",api_key=credentials.token,)response=client.chat.completions.create(model=model_id,messages=[{"role":"user","content":"Why is the sky blue?"}],stream=True,)forchunkinresponse:print(chunk)

extra_body examples

You can use either the SDK or the REST API to pass inextra_body.

Addthought_tag_marker

{...,"extra_body":{"google":{...,"thought_tag_marker":"..."}}}

Addextra_body using the SDK

client.chat.completions.create(...,extra_body={'extra_body':{'google':{...}}},)

extra_content examples

You can populate this field by using the REST API directly.

extra_content with stringcontent

{"messages":[{"role":"...","content":"...","extra_content":{"google":{...}}}]}

Per-messageextra_content

{"messages":[{"role":"...","content":[{"type":"...",...,"extra_content":{"google":{...}}}]}}

Per-tool callextra_content

{"messages":[{"role":"...","tool_calls":[{...,"extra_content":{"google":{...}}}]}]}

Samplecurl requests

You can use thesecurl requests directly, rather than going through the SDK.

Usethinking_config withextra_body

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.5-flash-preview-04-17", \    "messages": [ \      { "role": "user", \      "content": [ \        { "type": "text", \          "text": "Are there any primes number of the form n*ceil(log(n))" \        }] }], \    "extra_body": { \      "google": { \          "thinking_config": { \          "include_thoughts": true, "thinking_budget": 10000 \        }, \        "thought_tag_marker": "think" } }, \    "stream": true }'

Multimodal requests

The Chat Completions API supports a variety of multimodal input, including bothaudio and video.

Useimage_url to pass in image data

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.0-flash-001", \    "messages": [{ "role": "user", "content": [ \      { "type": "text", "text": "Describe this image" }, \      { "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'

Useinput_audio to pass in audio data

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT}/locations/us-central1/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.0-flash-001", \    "messages": [ \      { "role": "user", \        "content": [ \          { "type": "text", "text": "Describe this: " }, \          { "type": "input_audio", "input_audio": { \            "format": "audio/mp3", \            "data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'

Structured output

You can use theresponse_format parameter to get structured output.

Example using SDK

frompydanticimportBaseModelfromopenaiimportOpenAIclient=OpenAI()classCalendarEvent(BaseModel):name:strdate:strparticipants:list[str]completion=client.beta.chat.completions.parse(model="google/gemini-2.5-flash-preview-04-17",messages=[{"role":"system","content":"Extract the event information."},{"role":"user","content":"Alice and Bob are going to a science fair on Friday."},],response_format=CalendarEvent,)print(completion.choices[0].message.parsed)

Using the global endpoint in OpenAI compatible mode

The following sample shows how to use the global endpoint in OpenAI compatible mode:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/global/endpoints/openapi/chat/completions\-d'{ \    "model": "google/gemini-2.0-flash-001", \    "messages": [ \    {"role": "user", \      "content": "Hello World" \      }] \      }'

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.