Embeddings

  • This module explains how to create embeddings, which are lower-dimensional representations of sparse data that address the problems of large input vectors and lack of meaningful relations between vectors in one-hot encoding.

  • One-hot encoding creates large input vectors, leading to a huge number of weights in a neural network, requiring more data, computation, and memory.

  • One-hot encoding vectors lack meaningful relationships, failing to capture semantic similarities between items, like the example of hot dogs and shawarmas being more similar than hot dogs and salads.

  • Embeddings offer a solution by providing dense vector representations that capture semantic relationships and reduce the dimensionality of data, improving efficiency and performance in machine learning models.

  • This module assumes familiarity with introductory machine learning concepts like linear regression, categorical data, and neural networks.

Estimated module length: 45 minutesLearning objectivesPrerequisites:

This module assumes you are familiar with the concepts covered in the following modules:

Imagine you're developing a food-recommendation application, whereusers input their favorite meals, and the app suggests similar mealsthat they might like. You want to develop a machine learning (ML) modelthat can predict food similarity, so your app can make high qualityrecommendations ("Since you like pancakes, we recommend crepes").

To train your model, you curate a dataset of 5,000 popularmeal items, includingborscht,hot dog,salad,pizza,andshawarma.

Figure 1. A set of illustrations of five food items. Clockwise from       top-left: borscht, hot dog, salad, pizza, shawarma.
Figure 1. Sampling of meal items included in the food dataset.

You create ameal feature that contains aone-hot encodedrepresentation of each of the meal items in the dataset.Encoding refers to the process ofchoosing an initial numerical representation of data to train the model on.

Figure 2. Top: a visualization of the one-hot encoding for borscht.       The vector [1, 0, 0, 0, ..., 0] is displayed above six boxes,       each aligned from left       to right with one of the vector numbers. The boxes, from left to right       contain the following images: borscht, hot dog, salad, pizza, [empty],       shawarma. Middle: a visualization of the one-hot encoding for hot dog.       The vector [0, 1, 0, 0, ..., 0] is displayed above six boxes, each       aligned from left to right with one of the vector numbers. The boxes have       the same images from left to right as for the borscht visualization       above. Bottom: a visualization of the one-hot encoding for shawarma. The       vector [0, 0, 0, 0, ..., 1] is displayed above six boxes, each aligned       from left to right with one of the vector numbers. The boxes have       the same images from left to right as for the borscht and hot dog       visualizations.
Figure 2. One-hot encodings of borscht, hot dog, and shawarma. Each one-hot encoding vector has a length of 5,000 (one entry for each menu item in the dataset). The ellipsis in the diagram represents the 4,995 entries not shown.

Pitfalls of sparse data representations

Reviewing these one-hot encodings, you notice several problems with thisrepresentation of the data.

  • Number of weights. Large input vectors mean a huge number ofweightsfor aneural network.With M entries in your one-hot encoding, and Nnodes in the first layer of the network after the input, the model has to trainMxN weights for that layer.
  • Number of datapoints. The more weights in your model, the more data youneed to train effectively.
  • Amount of computation. The more weights, the more computation requiredto train and use the model. It's easy to exceed the capabilities of yourhardware.
  • Amount of memory. The more weights in your model, the more memory thatis needed on the accelerators that train and serve it. Scaling this upefficiently is very difficult.
  • Difficulty of supporting on-device machine learning (ODML). If you're hoping to run your ML model on local devices (as opposed to servingthem), you'll need to be focused on making your model smaller, and will wantto decrease the number of weights.

In this module, you'll learn how to createembeddings, lower-dimensionalrepresentations of sparse data, that address these issues.

Key terms:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-25 UTC.