分類:預測偏誤
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
如
線性迴歸
計算各個符記
預測偏誤
是一項快速檢查,可找出模型或訓練資料相關問題
從過往經歷來看,Google 在
初期就遇到資料處理方面的難題
預測偏誤是指模型平均值差異
預測
以及平均值
實際資料
資料。使用資料集訓練的模型
其中 5% 的電子郵件是垃圾郵件預測的
所分類的電子郵件為垃圾郵件。也就是說
真值資料集為 0.05,而模型預測結果的平均值
且 0.05在這種情況下,模型的預測偏誤為零。/
模型可能仍有其他問題
但如果模型預測電子郵件成為垃圾郵件的機率為 50%,則
訓練資料集出錯時
或模型本身不限
代表模型
導致模型出現偏誤
下列原因可能造成預測偏誤:
- 資料中的偏誤或雜訊,包括訓練集的偏誤取樣
- 過於強正規化,意味著模型過度簡化而遺失
達到特定目的
- 模型訓練管道中的錯誤
- 提供給模型的功能組合不足以執行工作
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2024-08-13 (世界標準時間)。
[null,null,["上次更新時間:2024-08-13 (世界標準時間)。"],[[["\u003cp\u003ePrediction bias, calculated as the difference between the average prediction and the average ground truth, is a quick check for model or data issues.\u003c/p\u003e\n"],["\u003cp\u003eA model with zero prediction bias ideally predicts the same average outcome as observed in the ground truth data, such as a spam detection model predicting the same percentage of spam emails as actually present in the dataset.\u003c/p\u003e\n"],["\u003cp\u003eSignificant prediction bias can indicate problems in the training data, the model itself, or the new data being applied to the model.\u003c/p\u003e\n"],["\u003cp\u003eCommon causes of prediction bias include biased data, excessive regularization, bugs in the training process, and insufficient features provided to the model.\u003c/p\u003e\n"]]],[],null,["# Classification: Prediction bias\n\nAs mentioned in the\n[Linear regression](/machine-learning/crash-course/linear-regression)\nmodule, calculating\n[**prediction bias**](/machine-learning/glossary#prediction_bias)\nis a quick check that can flag issues with the model or training data\nearly on.\n\nPrediction bias is the difference between the mean of a model's\n[**predictions**](/machine-learning/glossary#prediction)\nand the mean of\n[**ground-truth**](/machine-learning/glossary#ground-truth) labels in the\ndata. A model trained on a dataset\nwhere 5% of the emails are spam should predict, on average, that 5% of the\nemails it classifies are spam. In other words, the mean of the labels in the\nground-truth dataset is 0.05, and the mean of the model's predictions should\nalso be 0.05. If this is the case, the model has zero prediction bias. Of\ncourse, the model might still have other problems.\n\nIf the model instead predicts 50% of the time that an email is spam, then\nsomething is wrong with the training dataset, the new dataset the model is\napplied to, or with the model itself. Any\nsignificant difference between the two means suggests that the model has\nsome prediction bias.\n\nPrediction bias can be caused by:\n\n- Biases or noise in the data, including biased sampling for the training set\n- Too-strong regularization, meaning that the model was oversimplified and lost some necessary complexity\n- Bugs in the model training pipeline\n- The set of features provided to the model being insufficient for the task\n\n| **Key terms:**\n|\n| - [Ground truth](/machine-learning/glossary#ground-truth)\n| - [Prediction](/machine-learning/glossary#prediction)\n- [Prediction bias](/machine-learning/glossary#prediction_bias) \n[Help Center](https://support.google.com/machinelearningeducation)"]]