虽然不同任务的嵌入各不相同,但有一个任务具有一定的通用适用性:预测字词的上下文。经过训练以预测字词上下文的模型会假定在类似上下文中出现的字词在语义上是相关的。例如,包含“They rode a burro down into the Grand Canyon”和“They rode a horse down into the canyon”这两句话的训练数据表明,“horse”出现在与“burro”类似的上下文中。事实证明,基于语义相似性的嵌入对于许多常规语言任务都非常有效。
[null,null,["最后更新时间 (UTC):2025-05-15。"],[[["\u003cp\u003eEmbeddings are low-dimensional representations of high-dimensional data, often used to capture semantic relationships between items.\u003c/p\u003e\n"],["\u003cp\u003eEmbeddings place similar items closer together in the embedding space, allowing for efficient machine learning on large datasets.\u003c/p\u003e\n"],["\u003cp\u003eThe distance between points in an embedding space represents the relative similarity between the corresponding items.\u003c/p\u003e\n"],["\u003cp\u003eReal-world embeddings can encode complex relationships, like those between countries and their capitals, allowing models to detect patterns.\u003c/p\u003e\n"],["\u003cp\u003eStatic embeddings like word2vec represent all meanings of a word with a single point, which can be a limitation in some cases.\u003c/p\u003e\n"]]],[],null,["# Embeddings: Embedding space and static embeddings\n\nAn [**embedding**](/machine-learning/glossary#embedding-vector) is a\nvector representation of data in\n[**embedding space**](/machine-learning/glossary#embedding-space). Generally\nspeaking, a model finds potential embeddings by projecting the high-dimensional\nspace of initial data vectors into a lower-dimensional space.\nFor a discussion of high-dimensional versus\nlow-dimensional data, see the\n[Categorical Data](/machine-learning/crash-course/categorical-data/one-hot-encoding)\nmodule.\n\nEmbeddings make it easier to do machine learning on large\n[**feature vectors**](/machine-learning/glossary#feature-vector), such\nas the sparse vectors representing meal items discussed in the\n[previous section](/machine-learning/crash-course/embeddings). Sometimes the relative positions of items in\nembedding space have a potential semantic relationship, but often the process of\nfinding a lower-dimensional space, and relative positions in that space, is not\ninterpretable by humans, and the resulting embeddings are difficult to\nunderstand.\n\nStill, for the sake of human understanding, to give an idea of how embedding\nvectors represent information, consider the\nfollowing one-dimensional representation of the dishes\n,\n,\n,\n, and\n,\non a scale of \"least like a\n\"\nto \"most like a sandwich.\" The single dimension is an imaginary measure of\n\"sandwichness.\"\n**Figure 3.** Foods along an imagined dimension of \"sandwichness.\"\n\nWhere on this line would an\n\nfall? Arguably, it could be placed between `hot dog` and `shawarma`. But apple\nstrudel also seems to have an additional dimension of *sweetness*\nor *dessertness* that makes it very different from the other options.\nThe following figure visualizes this by adding a \"dessertness\" dimension:\n**Figure 4.** Foods plotted by both \"sandwichness\" and \"dessertness.\"\n\nAn embedding represents each item in *n* -dimensional space with *n*\nfloating-point numbers (typically in the range --1 to 1 or 0 to 1).\nThe embedding in Figure 3 represents each food in one-dimensional space\nwith a single coordinate, while Figure 4 represents each food in\ntwo-dimensional space with two coordinates. In Figure 4, \"apple strudel\"\nis in the upper-right quadrant of the graph and could be\nassigned the point (0.5, 0.3), whereas \"hot dog\" is in the bottom-right\nquadrant of the graph and could be assigned the point (0.2, --0.5).\n\nIn an embedding, the distance between any two items can be calculated\nmathematically, and can be interpreted as a measure of relative similarity\nbetween those two items. Two things that are close to each other, like\n`shawarma` and `hot dog` in Figure 4, are more closely related in the model's\nrepresentation of the data than two things more distant from each\nother, like `apple strudel` and `borscht`.\n| To learn about different methods of calculating distance between embedding vectors, see [Measuring Similarity from Embeddings](/machine-learning/clustering/dnn-clustering/supervised-similarity).\n\nNotice also that in the 2D space in Figure 4, `apple strudel` is much farther\nfrom `shawarma` and `hot dog` than it would be in the 1D space, which matches\nintuition: `apple strudel` is not as similar to a hot dog or a shawarma as hot\ndogs and shawarmas are to each other.\n\nNow consider borscht, which is much more liquid than the other items. This\nsuggests a third dimension, *liquidness*, or how liquid a food might be.\nAdding that dimension, the items could be visualized in 3D in this way:\n**Figure 5.** Foods plotted by \"sandwichness,\" \"dessertness,\" and \"liquidness.\"\n\nWhere in this 3D space would\ngo? It's\nsoupy, like borscht, and a sweet dessert, like apple strudel, and most\ndefinitely not a sandwich. Here is one possible placement:\n**Figure 6.** Adding tangyuan to the previous image, high on \"dessertness\" and \"liquidness\" and low on \"sandwichness.\"\n\nNotice how much information is expressed in these three dimensions.\nYou could imagine adding additional dimensions, like how meaty or\nbaked a food might be, though 4D, 5D, and higher-dimensional spaces are\ndifficult to visualize.\n\nReal-world embedding spaces\n---------------------------\n\nIn the real world, embedding spaces are *d* -dimensional, where *d* is much\nhigher than 3, though lower than the dimensionality of the data, and\nrelationships between data points are not necessarily as intuitive as in the\ncontrived illustration above. (For word embeddings, *d* is often 256, 512, or\n1024.^[1](#fn1)^)\n\nIn practice, the ML practitioner usually sets the specific task and the number\nof embedding dimensions. The model then tries to arrange the training\nexamples to be close in an embedding space with the specified number of\ndimensions, or tunes for the number of dimensions, if *d* is not fixed.\nThe individual dimensions are rarely as understandable as\n\"dessertness\" or \"liquidness.\" Sometimes what they \"mean\" can be inferred\nbut this is not always the case.\n\nEmbeddings will usually be specific to the task, and differ from each other\nwhen the task differs. For example, the embeddings generated by a vegetarian\nversus non-vegetarian classification model will be different from the\nembeddings generated by a model that suggests dishes based on time of\nday or season. \"Cereal\" and \"breakfast sausage\" would probably be close\ntogether in the embedding space of a time-of-day model but far apart in the\nembedding space of a vegetarian versus non-vegetarian model, for example.\n\nStatic embeddings\n-----------------\n\nWhile embeddings differ from task to task, one task has some\ngeneral applicability: predicting the context of a word. Models trained to\npredict the context of a word assume that words appearing in similar contexts\nare semantically related. For example, training data that includes the\nsentences \"They rode a burro down into the Grand Canyon\" and \"They rode a horse\ndown into the canyon\" suggests that \"horse\" appears in similar contexts to\n\"burro.\" It turns out that embeddings based on semantic similarity work well\nfor many general language tasks.\n\nWhile it's an older example, and largely superseded by other models, the\n**word2vec** model remains useful for illustration. `word2vec` trains on a\ncorpus of documents to obtain a single\nglobal embedding per word. When each word or data point has a single embedding\nvector, this is called a **static embedding** . The following video walks\nthrough a simplified illustration of `word2vec` training.\n**Note** : **word2vec** can refer to both an algorithm for obtaining static word embeddings and a set of word vectors that were pretrained with that algorithm. It's used in both senses in this module. \n\nResearch suggests that these static embeddings, once trained, encode some\ndegree of semantic information, particularly in relationships between words.\nThat is, words that are used in similar contexts will be closer to each other\nin embedding space. The specific embeddings vectors generated\nwill depend on the corpus used for training.\nSee T. Mikolov et al (2013),\n[\"Efficient estimation of word representations in vector space\"](https://arxiv.org/abs/1301.3781),\nfor details.\n| **Key terms:**\n|\n| - [Embedding vector](/machine-learning/glossary#embedding-vector)\n- [Embedding space](/machine-learning/glossary#embedding-space) \n[Help Center](https://support.google.com/machinelearningeducation) \n\n*** ** * ** ***\n\n1. François Chollet, [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python)\n (Shelter Island, NY: Manning, 2017), 6.1.2. [↩](#fnref1)"]]