Embeddings APIs overview

Embeddings are numerical representations of text, images, or videos that capturerelationships between inputs. Machine learning models, especially generativeAI models, are suited for creating embeddings by identifying patterns withinlarge datasets. Applications can use embeddings to process and producelanguage, recognizing complex meanings and semantic relationships specific toyour content. You interact with embeddings every time you complete aGoogle Search or see music streaming recommendations.

Embeddings work by converting text, image, and video into arrays of floatingpoint numbers, called vectors. These vectors are designed to capture the meaningof the text, images, and videos. The length of the embedding array is called thevector's dimensionality. For example, one passage of text might be representedby a vector containing hundreds of dimensions. Then, by calculating thenumerical distance between the vector representations of two pieces of text, anapplication can determine the similarity between the objects.

Vertex AI supports two types of embeddings models, text and multimodal.

Text embeddings use cases

Some common use cases for text embeddings include:

Semantic search: Search text ranked by semantic similarity.
Classification: Return the class of items whose text attributes aresimilar to the given text.
Clustering: Cluster items whose text attributes are similar to the giventext.
Outlier Detection: Return items where text attributes are least relatedto the given text.
Conversational interface: Clusters groups of sentences which can lead to similarresponses, like in a conversation-level embedding space.

Example use case: Develop a book recommendation chatbot

If you want to develop a book recommendation chatbot, the first thing to do isto use a deep neural network (DNN) to convert each book into an embeddingvector, where one embedding vector represents one book. You can feed, as inputto the DNN, just the book title or just the text content. Or you can use both ofthese together, along with any other metadata describing the book, such as thegenre.

The embeddings in this example could be comprised of thousands of book titleswith summaries and their genre, and it might have representations for books likeWuthering Heights by Emily Brontë andPersuasion by Jane Austen that aresimilar to each other (small distance between numerical representation). Whereasthe numerical representation for the bookThe Great Gatsby by F. ScottFitzgerald would be further, as the time period, genre, and summary is lesssimilar.

The inputs are the main influence to the orientation of the embedding space. Forexample, if we only had book title inputs, then two books with similar titles,but very different summaries, could be close together. However, if we includethe title and summary, then these same books are less similar (further away) inthe embedding space.

Working with generative AI, this book-suggestion chatbot could summarize,suggest, and show you books which you might like (or dislike), based on yourquery.

Multimodal embeddings use cases

Some common use cases for multimodal embeddings include:

Image and text use cases:
- Image classification: Takes an image as input and predicts one or moreclasses (labels).
- Image search: Search relevant or similar images.
- Recommendations: Generate product or ad recommendations based on images.
Image, text, and video use cases:
- Recommendations: Generate product or advertisement recommendations basedon videos (similarity search).
- Video content search
- Using semantic search: Take a text as an input, and return a set ofranked frames matching the query.
- Using similarity search:
  - Take a video as an input, and return a set of videos matching thequery.
  - Take an image as an input, and return a set of videos matching thequery.
- Video classification: Takes a video as input and predicts one or moreclasses.

Example use case: Online retail experience

Online retailers are increasingly leveraging multimodal embeddings to enhancecustomer experience. Every time you see personalized product recommendationswhile shopping, and get visual results from a text search, you are interactingwith an embedding.

If you want to create a multimodal embedding for an online retail use case,start by processing each product image to generate a unique image embedding,which is a mathematical representation of its visual style, color palette, keydetails, and more. Simultaneously, convert product descriptions, customerreviews, and other relevant textual data into text embeddings that capture theirsemantic meaning and context. By merging these image and text embeddings into aunified search and recommendation engine, the store can offer personalizedrecommendations of visually similar items based on a customer's browsing historyand preferences. Additionally, it enables customers to search for products usingnatural language descriptions, with the engine retrieving and displaying themost visually similar items that match their search query. For example, if acustomer searches "Black summer dress", the search engine can display dresseswhich are black, and also are in summer dress cuts, made out of lightermaterial, and might be sleeveless. This powerful combination of visual andtextual understanding creates a streamlined shopping experience that enhancescustomer engagement, satisfaction, and ultimately can drive sales.

What's next

To learn more about embeddings, seeMeet AI's multitool: Vector embeddings.
To take a foundational ML crash course on embeddings, seeEmbeddings.
To learn more about how to store vector embeddings in a database, see theOverview of Vector Search.
To learn about responsible AI best practices and Vertex AI's safetyfilters, seeResponsible AI.
To learn how to get embeddings, see the following documents:
- Get text embeddings
- Get multimodal embeddings

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換