Use Hugging Face Models

Hugging Face provides pre-trained models, fine-tuningscripts, and development APIs that make the process of creating and discoveringLLMs easier.Model Gardencan serveTextEmbeddings,Text ToImage,TextGeneration,andImage Text ToTextmodels in HuggingFace.

Deployment options for Hugging Face models

You can deploy supported Hugging Face models in Vertex AI orGoogle Kubernetes Engine (GKE). The deployment optionyou choose can depend on the model you're using and how much controlyou want over your workloads.

Deploy in Vertex AI

Vertex AI offers a managed platform for building and scalingmachine learning projects without in-house MLOps expertise. You can useVertex AI as the downstream application that serves theHugging Face models. We recommend usingVertex AI if you want end-to-end MLOps capabilities, value-added MLfeatures, and a serverless experience for streamlined development.

To deploy a supported Hugging Face model in Vertex AI, goto Model Garden.
Go to Model Garden
Go to theOpen models on Hugging Face section and clickShowmore.
Find and select a model to deploy.
Optional: For theDeployment environment, selectVertex AI.
Optional: Specify the deployment details.
ClickDeploy.

To get started, see the following examples:

Some models have detailed model cards and the deployment settings are verified by Google, such asgoogle/gemma-3-27b-it,meta-llama/Llama-4-Scout-17B-16E-Instruct,Qwen/QwQ-32B,BAAI/bge-m3,intfloat/multilingual-e5-large-instruct,black-forest-labs/FLUX.1-dev, andHuggingFaceFW/fineweb-edu-classifier.
Some models have the deployment settings verified by Google but no detailed model cards, such asNousResearch/Genstruct-7B.
Some models have deployment settings generated automatically.
Some models have automatically generated deployment settings that are based on model metadata, such as some latest trending models intext generation,text embeddings,text to image generation, andimage text to text.

Deploy in GKE

Google Kubernetes Engine (GKE) is the Google Cloud solutionfor managed Kubernetes that provides scalability, security, resilience, and costeffectiveness. We recommend this option if you have existing Kubernetesinvestments, your organization has in-house MLOps expertise, or if you needgranular control over complex AI/ML workloads with unique security, datapipeline, and resource management requirements.

To deploy a supported Hugging Face model in GKE, goto Model Garden.
Go to Model Garden
Go to theOpen models on Hugging Face section and clickShowmore.
Find and select a model to deploy.
For theDeployment environment, selectGKE.
Follow the deployment instructions.

To get started, see the following examples:

Some models have detailed model cards and verified deployment settings, such asgoogle/gemma-3-27b-it,meta-llama/Llama-4-Scout-17B-16E-Instruct, andQwen/QwQ-32B.
Some models have verified deployment settings, but no detailed model cards, such asNousResearch/Genstruct-7B.

What does "Supported by Vertex AI" mean?

We automatically add the latest, most popular Hugging Face models to Model Garden.This process includes the automatic generation of a deployment configuration foreach model.

To address concerns regarding vulnerabilities and malicious code, weuse theHugging Face Malware Scannerto assess the safetyof files within each Hugging Face model repository on a daily basis. If amodel repository is flagged as containing malware, we immediately remove themodel from the Hugging Face gallery page.

While a model being designated assupported by Vertex AI signifies that ithas undergone testing and is deployable on Vertex AI, we don't guaranteethe absence of vulnerabilities or malicious code. We recommend that you conductyour own security verifications before deploying any model in your productionenvironment.

Tune deployment configurations for specific use cases

The default deployment configuration that is providedwith the one-click deployment option can't satisfy every requirementgiven the diverse range of use cases and varyingpriorities with latency, throughput, cost, and accuracy.

Therefore, you can initially experiment with the one-clickdeployment to establish a baseline, and then fine-tune the deploymentconfigurations by using the Colab notebook (vLLM,TGI,TEI,HF pytorch inference)or the Python SDK. This iterative approach lets you to tailor thedeployment to your precise needs to get the best possible performance foryour specific application.

What should you do if the model you want isn't listed in Model Garden

If you're looking for a specific model that's not listed inModel Garden, the model is not supported byVertex AI. The following sections describe the reasoning and what you cando.

Why isn't the model listed?

The following reasons explain why a model might not be in Model Garden:

It's not a top trending model: We often prioritize models that are widely popularand have strong community interest.
It's not yet compatible: The model might not work with a supportedserving container. For example, thevLLM containerfortext-generation andimage-text-to-text models.
Unsupported pipeline tasks: The model has ataskwhich we don't yetfully support at the moment. We support the following tasks:text-generation,text2text-generation,text-to-image,feature-extraction,sentence-similarity,andimage-text-to-text.

What are your options?

You can still work with models that aren't available in Model Garden:

Deploy it yourself using the Colab Notebook: We have the following ColabNotebooks: (vLLM,TGI,TEI,HF pytorch inference),which provide the flexibility to deploy models with custom configurations. Thisgives you complete control over the process.
Submit a Feature Request: work with your support engineer and submit afeature request through the Model Garden, or refer toVertex Generative AI support for additional help.
Keep an eye on updates: We regularly add new models to Model Garden.The model you're looking for might become available in the future, so checkback periodically!

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Use Hugging Face Models Stay organized with collections Save and categorize content based on your preferences.