公平性:找出偏誤
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
準備模型訓練和評估資料時,請務必考量公平性問題,並檢查可能導致偏差的潛在來源,以便在將模型發布至實際環境前,主動減輕其影響。
偏見可能會出現在哪些地方?以下列舉一些資料集中的警訊。
缺少特徵值
如果資料集中有一或多個特徵在大量範例中缺少值,這可能表示資料集中的某些重要特徵未充分呈現。
練習:隨堂測驗
您要訓練模型,根據各種特徵 (包括品種、年齡、體重、性情和每天掉毛的數量) 預測救援狗的領養可能性。您的目標是確保模型在所有類型的狗身上都能發揮同樣良好的效能,無論其外型或行為特徵為何
您發現訓練集的 5,000 個範例中有 1,500 個缺少個性值。下列哪些潛在偏誤來源應進行調查?
某些品種的狗更有可能缺少氣質資料。
如果系統提供的溫度資料與犬種有關,這可能會導致某些狗品種的可領性預測準確度降低。
未滿 12 個月的犬隻體溫資料較可能缺少溫度資料
如果個性資料的可用性與年齡相關,那麼相較於成犬,系統可能會對幼犬做出較不準確的適應性預測。
所有從大城市救出的狗狗都缺少氣質資料。
乍看之下,這可能不會是潛在的偏差來源,因為缺少的資料會對大城市的所有狗隻一視同仁,不論其品種、年齡、體重等。不過,我們仍需考量,狗隻所在地點可能會有效地代表這些物理特徵。舉例來說,如果大城市中的狗比偏遠地區的狗人數明顯小,對偏重犬隻或特定小型犬種的預測準確度就會降低。
資料集中隨機缺少個性資料。
如果個性資料確實是隨機遺漏,那麼這並不會成為潛在的偏差來源。不過,情緒資料可能會隨機遺失,進一步調查可能會揭露差異的原因。因此,請務必徹底檢查,排除其他可能性,而不要假設資料缺口是隨機發生的。
非預期的特徵值
探索資料時,您也應找出含有特徵值的示例,這些值特別不具特徵或不尋常。這些意外的特徵值可能表示資料收集期間發生問題,或其他可能導致偏差的誤差。
練習:檢查您的理解程度
請查看下列虛構的示例,瞭解如何訓練救援狗收養可行性模型。
品種 |
年齡 (歲) |
體重 (磅) |
性情 |
shedding_level |
玩具貴賓犬 |
2 |
12 |
excitable |
低 |
黃金獵犬 |
7 |
65 |
平靜 |
高 |
拉布拉多獵犬 |
35 |
73 |
平靜 |
高 |
法國鬥牛犬 |
0.5 |
11 |
平靜 |
medium |
不明混種犬 |
4 |
45 |
excitable |
高 |
巴吉度獵犬 |
9 |
48 |
平靜 |
medium |
您能否指出地圖資料有任何問題?
按一下這裡查看答案
品種 |
年齡 (歲) |
體重 (磅) |
性情 |
shedding_level |
玩具貴賓犬 |
2 |
12 |
excitable |
低 |
黃金獵犬 |
7 |
65 |
平靜 |
高 |
拉布拉多獵犬 |
35 |
73 |
平靜 |
高 |
法國鬥牛犬 |
0.5 |
11 |
平靜 |
medium |
不明混種犬 |
4 |
45 |
excitable |
高 |
巴吉度獵犬 |
9 |
48 |
平靜 |
medium |
根據 金氏世界紀錄的認證,最長壽的狗是 Bluey,牠是一隻澳洲牧牛犬,活到 29 歲 5 個月。因此,拉布拉多獵犬實際上是 35 歲的說法似乎不太可能,更有可能的是狗的年齡計算或記錄有誤 (也許狗實際上是 3.5 歲)。這個錯誤也可能表示資料集中年齡資料有更廣泛的準確度問題,需要進一步調查。
資料偏移
資料中若有任何偏差,例如某些群組或特徵的比例可能高於或低於實際發生率,就可能會導致模型出現偏差。
稽核模型成效時,除了查看匯總結果,也請依次要群組細分結果。舉例來說,在救援狗認養模型的情況下,為了確保公平性,單純查看整體準確度是不夠的。我們也應依次群組稽核成效,確保模型在各個犬種、年齡層和體型群組中均有良好表現。
在本單元後續的「評估偏差」部分,我們會進一步探討評估模型的不同方法,並依次評估各個子群組。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2024-11-10 (世界標準時間)。
[null,null,["上次更新時間:2024-11-10 (世界標準時間)。"],[[["\u003cp\u003eTraining data should represent real-world prevalence to avoid bias in machine learning models.\u003c/p\u003e\n"],["\u003cp\u003eMissing or unexpected feature values in the dataset can be indicative of potential sources of bias.\u003c/p\u003e\n"],["\u003cp\u003eData skew, where certain groups are under- or over-represented, can introduce bias and should be addressed.\u003c/p\u003e\n"],["\u003cp\u003eEvaluating model performance by subgroup ensures fairness and equal performance across different characteristics.\u003c/p\u003e\n"],["\u003cp\u003eAuditing for bias requires a thorough review of data and model outcomes to mitigate potential negative impacts.\u003c/p\u003e\n"]]],[],null,["# Fairness: Identifying bias\n\nAs you prepare your data for model training and evaluation, it's important to\nkeep issues of fairness in mind and audit for potential sources of\n[**bias**](/machine-learning/glossary#bias-ethicsfairness), so you can\nproactively mitigate its effects before releasing your model into production.\n\nWhere might bias lurk? Here are some red flags to look out for in your dataset.\n\nMissing feature values\n----------------------\n\nIf your dataset has one or more features that have missing values for a large\nnumber of examples, that could be an indicator that certain key characteristics\nof your dataset are under-represented.\n\n### Exercise: Check your understanding\n\nYou're training a model to predict adoptability of rescue dogs based on a variety of features, including breed, age, weight, temperament, and quantity of fur shed each day. Your goal is to ensure the model performs equally well on all types of dogs, irrespective of their physical or behavioral characteristics \n\n\u003cbr /\u003e\n\nYou discover that 1,500 of the 5,000 examples in the training set are\nmissing temperament values. Which of the following are potential sources\nof bias you should investigate? \nTemperament data is more likely to be missing for certain breeds of dogs. \nIf the availability of temperament data correlates with dog breed, then this might result in less accurate adoptability predictions for certain dog breeds. \nTemperament data is more likely to be missing for dogs under 12 months in age \nIf the availability of temperament data correlates with age, then this might result in less accurate adoptability predictions for puppies versus adult dogs. \nTemperament data is missing for all dogs rescued from big cities. \nAt first glance, it might not appear that this is a potential source of bias, since the missing data would affect all dogs from big cities equally, irrespective of their breed, age, weight, etc. However, we still need to consider that the location a dog is from might effectively serve as a proxy for these physical characteristics. For example, if dogs from big cities are significantly more likely to be smaller than dogs from more rural areas, that could result in less accurate adoptability predictions for lower-weight dogs or certain small-dog breeds. \nTemperament data is missing from the dataset at random. \nIf temperament data is truly missing at random, then that would not be a potential source of bias. However, it's possible temperament data might appear to be missing at random, but further investigation might reveal an explanation for the discrepancy. So it's important to do a thorough review to rule out other possibilities, rather than assume data gaps are random.\n\nUnexpected feature values\n-------------------------\n\nWhen exploring data, you should also look for examples that contain feature values\nthat stand out as especially uncharacteristic or unusual. These unexpected feature\nvalues could indicate problems that occurred during data collection or other\ninaccuracies that could introduce bias.\n\n### Exercise: Check your understanding\n\nReview the following hypothetical set of examples for training a rescue-dog\nadoptability model.\n\n| breed | age (yrs) | weight (lbs) | temperament | shedding_level |\n|---------------------|-----------|--------------|-------------|----------------|\n| toy poodle | 2 | 12 | excitable | low |\n| golden retriever | 7 | 65 | calm | high |\n| labrador retriever | 35 | 73 | calm | high |\n| french bulldog | 0.5 | 11 | calm | medium |\n| unknown mixed breed | 4 | 45 | excitable | high |\n| basset hound | 9 | 48 | calm | medium |\n\nCan you identify any problems with the feature data? \nClick here to see the answer \n\n| breed | age (yrs) | weight (lbs) | temperament | shedding_level |\n|---------------------|-----------|--------------|-------------|----------------|\n| toy poodle | 2 | 12 | excitable | low |\n| golden retriever | 7 | 65 | calm | high |\n| labrador retriever | 35 | 73 | calm | high |\n| french bulldog | 0.5 | 11 | calm | medium |\n| unknown mixed breed | 4 | 45 | excitable | high |\n| basset hound | 9 | 48 | calm | medium |\n\nThe oldest dog to have their age verified by *Guinness World Records*\nwas [Bluey](https://wikipedia.org/wiki/Bluey_(long-lived_dog)),\nan Australian Cattle Dog who lived to be 29 years and 5 months. Given that, it\nseems quite implausible that the labrador retriever is actually 35 years old,\nand more likely that the dog's age was either calculated or recorded\ninaccurately (maybe the dog is actually 3.5 years old). This error could\nalso be indicative of broader accuracy issues with age data in the dataset\nthat merit further investigation.\n\nData skew\n---------\n\nAny sort of skew in your data, where certain groups or characteristics may be\nunder- or over-represented relative to their real-world prevalence, can\nintroduce bias into your model.\n\nWhen auditing model performance, it's important not only to look at results in\naggregate, but to break out results by subgroup. For example, in the case of\nour rescue-dog adoptability model, to ensure fairness, it's not sufficient to\nsimply look at overall accuracy. We should also audit performance by subgroup\nto ensure the model performs equally well for each dog breed, age group, and\nsize group.\n\nLater in this module, in [Evaluating for Bias](/machine-learning/crash-course/fairness/evaluating-for-bias), we'll\ntake a closer look at different methods for evaluating models by subgroup.\n| **Key terms:**\n|\n- [Bias (ethics/fairness)](/machine-learning/glossary#bias-ethicsfairness) \n[Help Center](https://support.google.com/machinelearningeducation)"]]