步驟 5:調整超參數
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
我們必須選擇多個超參數來定義和訓練模型。我們仰賴直覺、例子和最佳做法建議。不過,我們最先設定的超參數值可能無法產生最佳結果。這只是提供訓練的好起點。每個問題都不相同,因此調整這些超參數有助於修正模型,以更準確地反映現有問題的具體細節。以下說明我們使用的幾個超參數,以及如何調整這些參數:
模型中的層數:類神經網路中的層數是其複雜程度的指標。選擇這個值時請務必謹慎。過多層會讓模型學習過多有關訓練資料的資訊,進而導致過度配適。層數太少會限制模型的學習能力,導致資源不足。以文字分類資料集來說,我們實驗了單層、雙層和三層機器學習功能。具有兩個層的模型成效良好,在某些情況下甚至比三層模型更好。同樣地,我們也試出了 4 層和 6 層的 sepCNN,而 4 層模型的執行效能也非常好。
每個圖層的單位數:圖層中的單元必須保存圖層執行轉換的資訊。對第一層而言,則取決於特徵數量。在後續圖層中,單元數量取決於從上一層展開或縮小表示法的情形。盡量減少層間資訊遺失的情況。我們嘗試了 [8, 16, 32, 64]
範圍內的值,所以 32/64 單位的效果不錯。
放棄率:模型在模型中用於正規化。可定義要捨棄的輸入比例做為預防過度的預防措施。建議範圍:0.2–0.5。
學習率:這是類神經網路權重在疊代之間變更的頻率。較長的學習率可能會導致權重出現大幅波動,而且我們可能找不到最理想的值。學習率偏低,但模型會佔用更多疊代次數。一開始可以先從 1e-4 開始。如果訓練速度非常慢,請提高這個值。如果模型不是學習模式,請嘗試降低學習率。
我們另外針對 sepCNN 模型提供了幾個額外的超參數:
核心大小:卷積視窗的大小。建議值:3 或 5。
嵌入維度:我們用來表示字詞嵌入的維度數量,也就是每個字詞向量的大小。建議值:50 至 300。在我們的實驗中,我們使用預先訓練的嵌入層和 200 個維度的 GloVe 嵌入功能。
嘗試使用這些超參數,看看哪個參數效果最好。依據用途選擇使用成效最佳的超參數後,就可以開始部署模型。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-27 (世界標準時間)。
[null,null,["上次更新時間:2025-07-27 (世界標準時間)。"],[[["\u003cp\u003eInitial hyperparameter choices provide a starting point for model training, but further tuning is crucial to optimize performance for specific text classification problems.\u003c/p\u003e\n"],["\u003cp\u003eThe number of layers in a neural network impacts its complexity, with two-layer MLPs and four-layer sepCNNs showing promising results in text classification.\u003c/p\u003e\n"],["\u003cp\u003eKey hyperparameters to adjust include the number of units per layer (32 or 64 performed well), dropout rate (0.2-0.5 recommended), and learning rate (start low and adjust based on training progress).\u003c/p\u003e\n"],["\u003cp\u003eFor sepCNN models, optimizing kernel size (3 or 5) and embedding dimensions (50-300) further enhances performance.\u003c/p\u003e\n"],["\u003cp\u003eExperimenting with different hyperparameter combinations is essential to achieve the best model performance for your specific use case before deployment.\u003c/p\u003e\n"]]],[],null,["# Step 5: Tune Hyperparameters\n\nWe had to choose a number of hyperparameters for defining and training the\nmodel. We relied on intuition, examples and best practice recommendations. Our\nfirst choice of hyperparameter values, however, may not yield the best results.\nIt only gives us a good starting point for training. Every problem is different\nand tuning these hyperparameters will help refine our model to better represent\nthe particularities of the problem at hand. Let's take a look at some of the\nhyperparameters we used and what it means to tune them:\n\n- **Number of layers in the model** : The number of layers in a neural network is\n an indicator of its complexity. We must be careful in choosing this value. Too\n many layers will allow the model to learn too much information about the\n training data, causing overfitting. Too few layers can limit the model's\n learning ability, causing underfitting. For text classification datasets, we\n experimented with one, two, and three-layer MLPs. Models with two layers\n performed well, and in some cases better than three-layer models. Similarly, we\n tried [sepCNN](https://developers.google.com/machine-learning/glossary?utm_source=DevSite&utm_campaign=Text-Class-Guide&utm_medium=referral&utm_content=glossary&utm_term=sepCNN#depthwise-separable-convolutional-neural-network-sepcnn)s\n with four and six layers, and the four-layer models performed well.\n\n- **Number of units per layer** : The units in a layer must hold the information\n for the transformation that a layer performs. For the first layer, this is\n driven by the number of features. In subsequent layers, the number of units\n depends on the choice of expanding or contracting the representation from the\n previous layer. Try to minimize the information loss between layers. We tried\n unit values in the range `[8, 16, 32, 64]`, and 32/64 units worked well.\n\n- **Dropout rate** : Dropout layers are used in the model for\n [regularization](https://developers.google.com/machine-learning/glossary/?utm_source=DevSite&utm_campaign=Text-Class-Guide&utm_medium=referral&utm_content=glossary&utm_term=dropout-regularization#dropout_regularization).\n They define the fraction of input to drop as a precaution for overfitting.\n Recommended range: 0.2--0.5.\n\n- **Learning rate**: This is the rate at which the neural network weights change\n between iterations. A large learning rate may cause large swings in the weights,\n and we may never find their optimal values. A low learning rate is good, but the\n model will take more iterations to converge. It is a good idea to start low, say\n at 1e-4. If the training is very slow, increase this value. If your model is not\n learning, try decreasing learning rate.\n\nThere are couple of additional hyperparameters we tuned that are specific to our\nsepCNN model:\n\n1. **Kernel size**: The size of the convolution window. Recommended values: 3 or\n 5.\n\n2. **Embedding dimensions**: The number of dimensions we want to use to represent\n word embeddings---i.e., the size of each word vector. Recommended values: 50--300.\n In our experiments, we used GloVe embeddings with 200 dimensions with a pre-\n trained embedding layer.\n\nPlay around with these hyperparameters and see what works best. Once you have\nchosen the best-performing hyperparameters for your use case, your model is\nready to be deployed."]]