Content-based filtering uses item features to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.
To demonstrate content-based filtering, let's hand-engineer some features for the Google Play store. The following figure shows a feature matrix where each row represents an app and each column represents a feature. Features could include categories (such as Education, Casual, Health), the publisher of the app, and many others. To simplify, assume this feature matrix is binary: a non-zero value means the app has that feature.
You also represent the user in the same feature space. Some of the user-related features could be explicitly provided by the user. For example, a user selects "Entertainment apps" in their profile. Other features can be implicit, based on the apps they have previously installed. For example, the user installed another app published by Science R Us.
The model should recommend items relevant to this user. To do so, you must first pick a similarity metric (for example, dot product). Then, you must set up the system to score each candidate item according to this similarity metric. Note that the recommendations are specific to this user, as the model did not use any information about other users.
Using dot product as a similarity measure
Consider the case where the user embedding \(x\) and the app embedding \(y\) are both binary vectors. Since \(\langle x, y \rangle = \sum_{i = 1}^d x_i y_i\), a feature appearing in both \(x\) and \(y\) contributes a 1 to the sum. In other words, \(\langle x, y \rangle\) is the number of features that are active in both vectors simultaneously. A high dot product then indicates more common features, thus a higher similarity.
Try it yourself!
Calculate the dot product for each app in the preceding app problem. Then use that information to answer the question below: