Model endpoint management overview Stay organized with collections Save and categorize content based on your preferences.
Preview This product is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
This page describes how to register an AI model endpointand invoke predictions with model endpoint management in Cloud SQL.To use AI models inproduction environments, seeBuild generative AI applications usingCloud SQL andWork with vector embeddings.
Overview
Model endpoint management lets you register a model endpoint, manage model endpoint metadata in yourCloud SQL instance, and then interact with the models using SQL queries.Cloud SQL provides thegoogle_ml_integration extension thatincludes functions to add and register the model endpoint metadata related to the models.You can use these models to generate vector embeddings or invoke predictions.
You can register the following model types by using model endpoint management:
- Vertex AI text embedding models.
- Custom-hosted text embedding models hosted in networks within Google Cloud.
Generic models with a JSON-based API. Examples of these models include the following:
- The
gemini-promodel from the Vertex AI Model Garden - The
open_aimodel for OpenAI models - Models hosted in networks within Google Cloud
- The
How it works
You can use model endpoint management to register a model endpoint that complies to the following:
- The model input and output support the JSON format.
- You can use the REST protocol to call the model.
When youregister a model endpoint with model endpoint management, model endpoint management registers each endpoint with a unique model ID as a reference to the model. You can use this model ID to query models, as follows:
Generate embeddings to translate text prompts to numerical vectors.You canstore generated embeddings as vector data when the
pgvectorextension is enabled in the database. Formore information, seeQuery and index embeddings with pgvector.Invoke predictions to call a model using SQL within a transaction.
Your applications can manage their model endpoints using thegoogle_ml_integrationextension. This extension provides the following SQL functions:
google_ml.create_model(): registers the modelendpoint that's used in the prediction or embedding functiongoogle_ml.create_sm_secret(): uses secrets in Google CloudSecret Manager, where the API keys are storedgoogle_ml.embedding(): generates text embeddingsgoogle_ml.predict_row(): generates predictions when you call genericmodels that support the JSON input and output formats
Key concepts
Before you start using model endpoint management, understand the concepts required to connect to and use the models.
Model provider
Model provider is the supported model hosting provider. The followingtable shows the model provider value you must set based on the model provider thatyou use:
| Model provider | Set in function as… |
|---|---|
| Vertex AI (includes Gemini) | google |
| Anthropic | anthropic |
| Hugging Face | hugging_face |
| OpenAI | open_ai |
| Other models hosted outside of Vertex AI, Anthropic, Hugging Face, and OpenAI | custom |
The default model provider iscustom.
Model types
Model types are the types of the AI model. When you register a model endpoint, you can setthetext_embedding orgeneric model types for the endpoint.
- Text embedding models with built-in support
- Model endpoint management provides built-in support for all versions of the
textembedding-geckomodel. To register these model endpoints,use thegoogle_ml.create_model()function. Cloud SQL sets up default transform functions for these models automatically. - The model type for these models is
text_embedding. - Other text embedding models
- For other text embedding models, you need to create transform functions to handle the input and output formats thatthe model supports. Optionally, you can use the HTTP header generation functionthat generates custom headers required by your model.
- The model type for these models is
text_embedding. - Generic models
- Model endpoint management also supports registering of all other model types apart from text embedding models. To invoke predictions for generic models, use the
google_ml.predict_row()function. You can set model endpoint metadata, such as a request endpoint and HTTP headers that are specific to your model. - You can'tpass transform functions when you register a generic model endpoint. Ensure thatwhen you invoke predictions the input to the function is in the JSON format, andthat you parse the JSON output to derive the final output.
- The model type for these models is
generic. Becausegenericis the default model type, if you register model endpoints for this type, then setting the model type is optional.
Authentication methods
You can use thegoogle_ml_integration extension tospecify differentauthentication methods to access your model.Setting these methods is optional and is required only if you need to authenticate to access your model.For Vertex AI models, the Cloud SQL service account is used for authentication. For other models, theAPI key or bearer token that is stored as a secret in theSecret Manager can be used with thegoogle_ml.create_sm_secret()SQL function.
The following table shows the authentication methods that you can set:
| Authentication method | Set in function as… | Model provider |
|---|---|---|
| Cloud SQL service agent | cloudsql_service_agent_iam | Vertex AI provider |
| Secret Manager | secret_manager | Models hosted outside of Vertex AI |
Prediction functions
Thegoogle_ml_integration extension includes the followingprediction functions:
google_ml.embedding()- Calls a registered text embedding model endpoint to generate embeddings. It includes built-in support for the
textembedding-geckomodel by Vertex AI. - For text embedding models without built-in support, the input and output parameters are unique to a model and need to be transformed for the function to call the model. Create a transform input function to transform input of the prediction function to the model specific input, and a transform output function to transform model specific output to the prediction function output.
google_ml.predict_row()- Calls a registered generic model endpoint, if the endpoint supports JSON-based APIs to invoke predictions.
Transform functions
Transform functions modify the input to a format that the model understands, andconvert the model response to the format that the prediction function expects. Thetransform functions are used when registering thetext-embedding model endpoint withoutbuilt-in support. The signature of the transform functions depends on theprediction function for the model type.
You can't use transform functions when registering ageneric model endpoint.
The following shows the signatures for the prediction function for textembedding models:
//definecustommodelspecificinput/outputtransformfunctions.CREATEORREPLACEFUNCTIONinput_transform_function(model_idVARCHAR(100),input_textTEXT)RETURNSJSON;CREATEORREPLACEFUNCTIONoutput_transform_function(model_idVARCHAR(100),response_jsonJSON)RETURNSreal[];For more information about how to create transform functions, seeTransform functions example.
HTTP header generation function
TheHTTP header generation function generates the output in JSON key value pairsthat are used as HTTP headers. The signature of the prediction function definesthe signatures of the header generation function.
The following example shows the signature for thegoogle_ml.embedding() prediction function:
CREATEORREPLACEFUNCTIONgenerate_headers(model_idVARCHAR(100),inputTEXT)RETURNSJSON;For thegoogle_ml.predict_row() prediction function, the signature is as follows:
CREATEORREPLACEFUNCTIONgenerate_headers(model_idVARCHAR(100),inputJSON)RETURNSJSON;For more information about how to create a header generation function, seeHeader generation function example.
Limitations
- To use AI models with your Cloud SQL instance, the maintenance version of your instance must be
R20240910.01_02or later. To upgrade your instance to this version, seePerform self-service maintenance.
What's next
- Set up authentication for model providers.
- Register a model endpoint with model endpoint management.
- Learn about themodel endpoint management reference.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-11-24 UTC.