'API 약관'의
선형 회귀
모듈, 계산
예측 편향
모델 또는 학습 데이터의 문제를
플래그할 수 있는 간단한 검사입니다
살펴봤습니다
예측 편향은 모델의 평균과
예상 검색어
및
정답 라벨은
데이터입니다. 데이터 세트로 학습된 모델
여기서 스팸은 이메일의 5% 가 평균적으로 전체의 5% 가
분류되는 이메일이 스팸입니다. 다시 말해서
정답 데이터 세트의 값은 0.05이며, 모델의 예측 평균은
0.05여야 합니다 이 경우 모델의 예측 편향은 0입니다. /
모델에는 여전히 다른 문제가 있을 수 있습니다
모델이 이메일의 50% 를 스팸이라고 예측한다면
학습 데이터 세트에 문제 발생, 모델이 새로 생성한 데이터 세트
모델 자체를 정의하는 것입니다. 모든 문자
두 평균값 간의 유의미한 차이는 모델이
모델의 예측에 기여한다는 것을
몇 가지 예측 편향을
줄일 수 있습니다
[null,null,["최종 업데이트: 2024-08-13(UTC)"],[[["\u003cp\u003ePrediction bias, calculated as the difference between the average prediction and the average ground truth, is a quick check for model or data issues.\u003c/p\u003e\n"],["\u003cp\u003eA model with zero prediction bias ideally predicts the same average outcome as observed in the ground truth data, such as a spam detection model predicting the same percentage of spam emails as actually present in the dataset.\u003c/p\u003e\n"],["\u003cp\u003eSignificant prediction bias can indicate problems in the training data, the model itself, or the new data being applied to the model.\u003c/p\u003e\n"],["\u003cp\u003eCommon causes of prediction bias include biased data, excessive regularization, bugs in the training process, and insufficient features provided to the model.\u003c/p\u003e\n"]]],[],null,["# Classification: Prediction bias\n\nAs mentioned in the\n[Linear regression](/machine-learning/crash-course/linear-regression)\nmodule, calculating\n[**prediction bias**](/machine-learning/glossary#prediction_bias)\nis a quick check that can flag issues with the model or training data\nearly on.\n\nPrediction bias is the difference between the mean of a model's\n[**predictions**](/machine-learning/glossary#prediction)\nand the mean of\n[**ground-truth**](/machine-learning/glossary#ground-truth) labels in the\ndata. A model trained on a dataset\nwhere 5% of the emails are spam should predict, on average, that 5% of the\nemails it classifies are spam. In other words, the mean of the labels in the\nground-truth dataset is 0.05, and the mean of the model's predictions should\nalso be 0.05. If this is the case, the model has zero prediction bias. Of\ncourse, the model might still have other problems.\n\nIf the model instead predicts 50% of the time that an email is spam, then\nsomething is wrong with the training dataset, the new dataset the model is\napplied to, or with the model itself. Any\nsignificant difference between the two means suggests that the model has\nsome prediction bias.\n\nPrediction bias can be caused by:\n\n- Biases or noise in the data, including biased sampling for the training set\n- Too-strong regularization, meaning that the model was oversimplified and lost some necessary complexity\n- Bugs in the model training pipeline\n- The set of features provided to the model being insufficient for the task\n\n| **Key terms:**\n|\n| - [Ground truth](/machine-learning/glossary#ground-truth)\n| - [Prediction](/machine-learning/glossary#prediction)\n- [Prediction bias](/machine-learning/glossary#prediction_bias) \n[Help Center](https://support.google.com/machinelearningeducation)"]]