Content-based filtering

Page Summary

Content-based filtering suggests items similar to a user's preferences by analyzing item features and user interactions.
User and item features are represented in a feature matrix, where common features indicate higher similarity.
Dot product is used as a similarity metric, with higher values indicating stronger relevance between user and item.
Recommendations are tailored to individual users based on their specific features and interactions, without using data from other users.
The system identifies the best recommendations by calculating dot products and selecting items with the highest scores.

Content-based filtering uses item features to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.

To demonstrate content-based filtering, let's hand-engineer some features for the Google Play store. The following figure shows a feature matrix where each row represents an app and each column represents a feature. Features could include categories (such as Education, Casual, Health), the publisher of the app, and many others. To simplify, assume this feature matrix is binary: a non-zero value means the app has that feature.

You also represent the user in the same feature space. Some of the user-related features could be explicitly provided by the user. For example, a user selects "Entertainment apps" in their profile. Other features can be implicit, based on the apps they have previously installed. For example, the user installed another app published by Science R Us.

The model should recommend items relevant to this user. To do so, you must first pick a similarity metric (for example, dot product). Then, you must set up the system to score each candidate item according to this similarity metric. Note that the recommendations are specific to this user, as the model did not use any information about other users.

Image of a matrix showing a user and apps that may be recommended

Using dot product as a similarity measure

Consider the case where the user embedding \(x\) and the app embedding \(y\) are both binary vectors. Since \(\langle x, y \rangle = \sum_{i = 1}^d x_i y_i\), a feature appearing in both \(x\) and \(y\) contributes a 1 to the sum. In other words, \(\langle x, y \rangle\) is the number of features that are active in both vectors simultaneously. A high dot product then indicates more common features, thus a higher similarity.

Try it yourself!

Calculate the dot product for each app in the preceding app problem. Then use that information to answer the question below:

Which app should we recommend?

The educational app created by Science R Us.

You are correct! This item has the highest dot product at 2. Our user really likes science and educational apps.

The health app created by Healthcare.

This app scores a 1. It isn't the worst recommendation our system could make, but it certainly isn't the best.

The casual app created by TimeWastr.

This app actually has the lowest dot product at 0. Our user isn't interested in casual apps like games.

Candidate generation overview

Advantages & disadvantages

Content-based filtering Stay organized with collections Save and categorize content based on your preferences.

Page Summary

Using dot product as a similarity measure

Try it yourself!

Content-based filtering