機器學習詞彙表:序列模型
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
本頁面包含序列模型的詞彙表。如要查看所有詞彙表術語,請按這裡。
B
bigram
N=2 的 N-gram。
E
爆炸式漸層問題
深層類神經網路 (尤其是迴圈類神經網路) 中梯度的傾向會變得非常陡峭 (高)。陡峭的梯度通常會導致深度神經網路中每個節點的權重大幅更新。
發生梯度爆炸問題的模型很難或無法訓練。漸層裁剪可緩解這個問題。
請比較梯度消失問題。
F
忘記門
長期短期記憶 單元格中用於調控資訊流的部分。忘記閘道會決定要從儲存格狀態中捨棄哪些資訊,以便維持內容。
G
梯度限幅
這是一種常用的機制,可在使用梯度下降法訓練模型時,人為限制 (截斷) 梯度的最大值,藉此緩解梯度爆炸問題。
L
長短期記憶 (LSTM)
循環神經網路中的一種單元格,用於處理應用程式中的資料序列,例如手寫辨識、機器翻譯和圖片說明文字。LSTM 會根據 RNN 中先前儲存格的新輸入和背景資訊,在內部記憶體狀態中保留記錄,藉此解決訓練 RNN 時因長資料序列而發生的梯度消失問題。
LSTM
長短期記憶的縮寫。
否
N 元語法
由 N 個字詞組成的有序序列。例如「truly madly」就是一個 2 字詞片語。由於順序相關,madly truly 與 truly madly 的 2 元組不同。
否 |
這類 N-gram 的名稱 |
範例 |
2 |
大元音節或 2 元音節 |
to go, go to, eat lunch, eat dinner |
3 |
三元組或 3 元組 |
ate too much, happily ever after, the bell tolls |
4 |
4 個字元 |
在公園散步,風中飄揚的灰塵,男孩吃了扁豆 |
許多自然語言理解模型都會使用 N-gram 預測使用者輸入或說出的下一個字詞。舉例來說,假設使用者輸入 happily ever。以三元組為基礎的 NLU 模型可能會預測使用者接下來會輸入「after」這個字。
請比較 N-gram 與詞袋,後者是未排序的字詞集合。
如需更多資訊,請參閱機器學習速成課程中的「大型語言模型」一文。
R
循環類神經網路
神經網路,經過多次有意執行,其中每個執行作業的部分會饋送至下一個執行作業。具體來說,上次執行作業的隱藏層會為下次執行作業提供相同隱藏層的部分輸入內容。循環類神經網路特別適合評估序列,這樣隱藏層就能從類神經網路先前執行的序列前半部學習。
舉例來說,下圖顯示執行四次的循環神經網路。請注意,在第一次執行時,隱藏層學習到的值會成為第二次執行時相同隱藏層的輸入值。同樣地,在第二次執行時隱藏層學習到的值,會成為第三次執行時相同隱藏層的輸入內容。如此一來,迴圈神經網路就能逐步訓練並預測整個序列的含義,而非只預測個別字詞的含義。
RNN
循環類神經網路的縮寫。
S
序列模型
輸入內容具有序列依賴性的模型。例如,從先前觀看的影片序列中預測下一個要觀看的影片。
T
時間間隔
循環類神經網路中一個「未展開」的單元格。例如,下圖顯示三個時間步 (標示為 t-1、t 和 t+1):
三元組
N=3 的 N-gram。
V
梯度消失問題
某些深度神經網路的早期隱藏層梯度,有變得異常平坦 (低) 的趨勢。梯度越來越低,導致深層類神經網路中節點的權重變化越來越小,導致學習效果不佳或完全沒有學習。發生梯度消失問題的模型很難或無法訓練。長短期記憶單元可解決這個問題。
請比較爆炸漸層問題。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-27 (世界標準時間)。
[null,null,["上次更新時間:2025-07-27 (世界標準時間)。"],[[["\u003cp\u003eThis page provides definitions for glossary terms related to Sequence Models.\u003c/p\u003e\n"],["\u003cp\u003eSequence models are used to analyze sequential data like text or video sequences.\u003c/p\u003e\n"],["\u003cp\u003eRecurrent Neural Networks (RNNs) are a key type of sequence model, with LSTMs being a popular variant.\u003c/p\u003e\n"],["\u003cp\u003eCommon challenges in training sequence models include the exploding and vanishing gradient problems.\u003c/p\u003e\n"],["\u003cp\u003eN-grams are used to represent sequences of words and are crucial for natural language understanding tasks.\u003c/p\u003e\n"]]],[],null,["# Machine Learning Glossary: Sequence Models\n\nThis page contains Sequence Models glossary terms. For all glossary terms,\n[click here](/machine-learning/glossary).\n\n\nB\n---\n\n\u003cbr /\u003e\n\n\nbigram\n------\n\n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=2.\n\n\nE\n---\n\n\u003cbr /\u003e\n\n\nexploding gradient problem\n--------------------------\n\n#seq\n\n\u003cbr /\u003e\n\nThe tendency for [**gradients**](/machine-learning/glossary#gradient) in\n[**deep neural networks**](/machine-learning/glossary#deep_neural_network) (especially\n[**recurrent neural networks**](#recurrent_neural_network)) to become\nsurprisingly steep (high). Steep gradients often cause very large updates\nto the [**weights**](/machine-learning/glossary#weight) of each [**node**](/machine-learning/glossary#node) in a\ndeep neural network.\n\nModels suffering from the exploding gradient problem become difficult\nor impossible to train. [**Gradient clipping**](#gradient_clipping)\ncan mitigate this problem.\n\nCompare to [**vanishing gradient problem**](#vanishing_gradient_problem).\n\n\nF\n---\n\n\u003cbr /\u003e\n\n\nforget gate\n-----------\n\n#seq\n\n\u003cbr /\u003e\n\nThe portion of a [**Long Short-Term Memory**](#Long_Short-Term_Memory)\ncell that regulates the flow of information through the cell.\nForget gates maintain context by deciding which information to discard\nfrom the cell state.\n\n\nG\n---\n\n\u003cbr /\u003e\n\n\ngradient clipping\n-----------------\n\n#seq\n\n\u003cbr /\u003e\n\nA commonly used mechanism to mitigate the\n[**exploding gradient problem**](#exploding_gradient_problem) by artificially\nlimiting (clipping) the maximum value of gradients when using\n[**gradient descent**](/machine-learning/glossary#gradient_descent) to [**train**](/machine-learning/glossary#training) a model.\n\n\nL\n---\n\n\u003cbr /\u003e\n\n\nLong Short-Term Memory (LSTM)\n-----------------------------\n\n#seq\n\n\u003cbr /\u003e\n\nA type of cell in a\n[**recurrent neural network**](#recurrent_neural_network) used to process\nsequences of data in applications such as handwriting recognition,\n[**machine translation**](/machine-learning/glossary#machine-translation), and image captioning. LSTMs\naddress the [**vanishing gradient problem**](#vanishing_gradient_problem) that\noccurs when training RNNs due to long data sequences by maintaining history in\nan internal memory state based on new input and context from previous cells in\nthe RNN.\n\n\nLSTM\n----\n\n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**Long Short-Term Memory**](#Long_Short-Term_Memory).\n\n\nN\n---\n\n\u003cbr /\u003e\n\n\nN-gram\n------\n\n#seq \n#language\n\n\u003cbr /\u003e\n\nAn ordered sequence of N words. For example, *truly madly* is a 2-gram. Because\norder is relevant, *madly truly* is a different 2-gram than *truly madly*.\n\n| N | Name(s) for this kind of N-gram | Examples |\n|---|---------------------------------|-----------------------------------------------------------|\n| 2 | bigram or 2-gram | *to go, go to, eat lunch, eat dinner* |\n| 3 | trigram or 3-gram | *ate too much, happily ever after, the bell tolls* |\n| 4 | 4-gram | *walk in the park, dust in the wind, the boy ate lentils* |\n\nMany [**natural language understanding**](/machine-learning/glossary#natural_language_understanding)\nmodels rely on N-grams to predict the next word that the user will type\nor say. For example, suppose a user typed *happily ever* .\nAn NLU model based on trigrams would likely predict that the\nuser will next type the word *after*.\n\nContrast N-grams with [**bag of words**](/machine-learning/glossary#bag_of_words), which are\nunordered sets of words.\n\nSee [Large language models](/machine-learning/crash-course/llm)\nin Machine Learning Crash Course for more information.\n\n\nR\n---\n\n\u003cbr /\u003e\n\n\nrecurrent neural network\n------------------------\n\n#seq\n\n\u003cbr /\u003e\n\nA [**neural network**](/machine-learning/glossary#neural_network) that is intentionally run multiple\ntimes, where parts of each run feed into the next run. Specifically,\nhidden layers from the previous run provide part of the\ninput to the same hidden layer in the next run. Recurrent neural networks\nare particularly useful for evaluating sequences, so that the hidden layers\ncan learn from previous runs of the neural network on earlier parts of\nthe sequence.\n\nFor example, the following figure shows a recurrent neural network that\nruns four times. Notice that the values learned in the hidden layers from\nthe first run become part of the input to the same hidden layers in\nthe second run. Similarly, the values learned in the hidden layer on the\nsecond run become part of the input to the same hidden layer in the\nthird run. In this way, the recurrent neural network gradually trains and\npredicts the meaning of the entire sequence rather than just the meaning\nof individual words.\n\n\nRNN\n---\n\n#seq\n\n\u003cbr /\u003e\n\nAbbreviation for [**recurrent neural networks**](#recurrent_neural_network).\n\n\nS\n---\n\n\u003cbr /\u003e\n\n\nsequence model\n--------------\n\n#seq\n\n\u003cbr /\u003e\n\nA model whose inputs have a sequential dependence. For example, predicting\nthe next video watched from a sequence of previously watched videos.\n\n\nT\n---\n\n\u003cbr /\u003e\n\n\ntimestep\n--------\n\n#seq\n\n\u003cbr /\u003e\n\nOne \"unrolled\" cell within a\n[**recurrent neural network**](#recurrent_neural_network).\nFor example, the following figure shows three timesteps (labeled with\nthe subscripts t-1, t, and t+1):\n\n\ntrigram\n-------\n\n#seq \n#language\n\n\u003cbr /\u003e\n\nAn [**N-gram**](#N-gram) in which N=3.\n\n\nV\n---\n\n\u003cbr /\u003e\n\n\nvanishing gradient problem\n--------------------------\n\n#seq\n\n\u003cbr /\u003e\n\nThe tendency for the gradients of early [**hidden layers**](/machine-learning/glossary#hidden_layer)\nof some [**deep neural networks**](/machine-learning/glossary#deep_neural_network) to become\nsurprisingly flat (low). Increasingly lower gradients result in increasingly\nsmaller changes to the weights on nodes in a deep neural network, leading to\nlittle or no learning. Models suffering from the vanishing gradient problem\nbecome difficult or impossible to train.\n[**Long Short-Term Memory**](#Long_Short-Term_Memory) cells address this issue.\n\nCompare to [**exploding gradient problem**](#exploding_gradient_problem)."]]