Collaborative filtering Stay organized with collections Save and categorize content based on your preferences.
Page Summary
Collaborative filtering leverages similarities between users and items to provide recommendations, unlike content-based filtering.
This method allows for serendipitous recommendations by suggesting items based on the preferences of similar users.
Collaborative filtering models can automatically learn embeddings, eliminating the need for manual feature engineering.
These embeddings represent users and items in a shared space where their proximity reflects similarity and preference.
Training these models involves optimizing embeddings to align with user feedback, bringing similar users and preferred items closer together in the embedding space.
To address some of the limitations of content-based filtering,collaborative filtering usessimilarities between users anditems simultaneously to provide recommendations. This allowsfor serendipitous recommendations; that is, collaborative filteringmodels can recommend an item to user A based on the interests ofa similar user B. Furthermore, the embeddings can be learnedautomatically, without relying on hand-engineering of features.
Movie recommendation example
Consider a movie recommendation system in which the training data consistsof a feedback matrix in which:
- Each row represents a user.
- Each column represents an item (a movie).
The feedback about movies falls into one of two categories:
- Explicit— users specify how much they liked a particular movieby providing a numerical rating.
- Implicit— if a user watches a movie, the system infers that theuser is interested.
To simplify, we will assume that the feedback matrix is binary; that is, a valueof 1 indicates interest in the movie.
When a user visits the homepage, the system should recommend moviesbased on both:
- similarity to movies the user has liked in the past
- movies that similar users liked
For the sake of illustration, let's hand-engineer some features for the moviesdescribed in the following table:
| Movie | Rating | Description |
|---|---|---|
| The Dark Knight Rises | PG-13 | Batman endeavors to save Gotham City from nuclear annihilation in this sequel toThe Dark Knight, set in the DC Comics universe. |
| Harry Potter and the Sorcerer's Stone | PG | A orphaned boy discovers he is a wizard and enrolls in Hogwarts School of Witchcraft and Wizardry, where he wages his first battle against the evil Lord Voldemort. |
| Shrek | PG | A lovable ogre and his donkey sidekick set off on a mission to rescue Princess Fiona, who is emprisoned in her castle by a dragon. |
| The Triplets of Belleville | PG-13 | When professional cycler Champion is kidnapped during the Tour de France, his grandmother and overweight dog journey overseas to rescue him, with the help of a trio of elderly jazz singers. |
| Memento | R | An amnesiac desperately seeks to solve his wife's murder by tattooing clues onto his body. |
1D embedding
Suppose we assign to each movie a scalar in \([-1, 1]\) that describeswhether the movie is for children (negative values) or adults (positive values).Suppose we also assign a scalar to each user in \([-1, 1]\) that describesthe user's interest in children's movies (closer to -1) or adultmovies (closer to +1). The product of the movie embedding and the userembedding should be higher (closer to 1) for movies that we expect the userto like.
In the diagram below, each checkmark identifies a movie that a particularuser watched. The third and fourth users have preferences that arewell explained by this feature—the third user prefers movies for childrenand the fourth user prefers movies for adults. However, the first and secondusers' preferences are not well explained by this single feature.
2D embedding
One feature was not enough to explain the preferences of all users. To overcomethis problem, let's add a second feature: the degree to which each movie isa blockbuster or an arthouse movie. With a second feature, we can now representeach movie with the following two-dimensional embedding:
We again place our users in the same embedding space to best explainthe feedback matrix: for each (user, item) pair, we would like thedot product of the user embedding and the item embedding to be closeto 1 when the user watched the movie, and to 0 otherwise.
In this example, we hand-engineered the embeddings. In practice, the embeddingscan be learnedautomatically, which is the power of collaborative filteringmodels. In the next two sections, we will discuss different models to learnthese embeddings, and how to train them.
The collaborative nature of this approach is apparent when the model learns theembeddings. Suppose the embedding vectors for the movies are fixed. Then,the model can learn an embedding vector for the users to best explaintheir preferences. Consequently, embeddings of users with similar preferenceswill be close together. Similarly, if the embeddings for the users are fixed,then we can learn movie embeddings to best explain the feedback matrix.As a result, embeddings of movies liked by similar users will be close inthe embedding space.
Check your understanding
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.