检索
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
假设您有一个嵌入模型。针对某个用户,您如何决定推荐哪些商品?
在服务时,给定一个查询,首先执行以下某项操作:
- 对于矩阵分解模型,查询(或用户)嵌入是静态已知的,系统只需从用户嵌入矩阵中查找即可。
- 对于 DNN 模型,系统会在传送时通过在特征矢量 \(x\)上运行网络来计算查询嵌入 \(\psi(x)\)。
获得查询嵌入 \(q\)后,在嵌入空间中搜索与 \(q\) 相近的商品嵌入\(V_j\) 。这是最近邻问题。例如,您可以根据相似度得分 \(s(q, V_j)\)返回前 k 项。

您可以在推荐相关商品时采用类似的方法。例如,当用户观看 YouTube 视频时,系统可以首先查找该项的嵌入,然后查找嵌入空间中相邻的其他项\(V_j\) 的嵌入。
大规模检索
为了计算嵌入空间中的最近邻,系统可以详尽地对每个潜在候选对象评分。对于非常大的语料库,穷尽评分可能非常耗费资源,但您可以使用以下任一策略来提高效率:
- 如果以静态方式知道查询嵌入,则系统可以离线执行详尽评分、预计算并存储每个查询的热门候选对象列表。这是推荐相关商品的常见做法。
- 使用近似最近邻。
Google 在 GitHub 上提供了一款名为 ScaNN(可伸缩的最近邻)的开源工具。该工具可大规模执行高效的向量相似度搜索。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2024-11-11。
[null,null,["最后更新时间 (UTC):2024-11-11。"],[[["\u003cp\u003eRecommender systems leverage embedding models to identify items similar to a user's preferences or a given item.\u003c/p\u003e\n"],["\u003cp\u003eThe system finds relevant items by searching for embeddings that are close to the user or item embedding in the embedding space, effectively solving a nearest neighbor problem.\u003c/p\u003e\n"],["\u003cp\u003eFor large-scale retrieval, efficiency can be improved by precomputing top candidates or using approximate nearest neighbor search techniques like ScaNN.\u003c/p\u003e\n"]]],[],null,["# Retrieval\n\n**Suppose you have an embedding model. Given a user, how would you\ndecide which items to recommend?**\n\nAt serve time, given a query, you start by doing one of the following:\n\n- For a matrix factorization model, the query (or user) embedding is known statically, and the system can simply look it up from the user embedding matrix.\n- For a DNN model, the system computes the query embedding \\\\(\\\\psi(x)\\\\) at serve time by running the network on the feature vector \\\\(x\\\\).\n\nOnce you have the query embedding \\\\(q\\\\), search for item embeddings\n\\\\(V_j\\\\) that are close to \\\\(q\\\\) in the embedding space.\nThis is a nearest neighbor problem. For example, you can return the top k\nitems according to the similarity score \\\\(s(q, V_j)\\\\).\n\nYou can use a similar approach in related-item recommendations. For example,\nwhen the user is watching a YouTube video, the system can first look up the\nembedding of that item, and then look for embeddings of other items\n\\\\(V_j\\\\) that are close in the embedding space.\n\nLarge-scale retrieval\n---------------------\n\nTo compute the nearest neighbors in the embedding space, the system\ncan exhaustively score every potential candidate. Exhaustive scoring\ncan be expensive for very large corpora, but you can use either of\nthe following strategies to make it more efficient:\n\n- If the query embedding is known statically, the system can perform exhaustive scoring offline, precomputing and storing a list of the top candidates for each query. This is a common practice for related-item recommendation.\n- Use approximate nearest neighbors. Google provides an open-source tool on GitHub called [ScaNN](https://github.com/google-research/google-research/tree/master/scann) (Scalable Nearest Neighbors). This tool performs efficient vector similarity search at scale."]]