公平性:识别偏见
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
在准备数据以进行模型训练和评估时,请务必考虑公平性问题,并审核是否存在可能导致出现偏差的潜在来源,以便在将模型发布到生产环境之前主动减少其影响。
偏见可能存在于哪些方面?以下是数据集中需要注意的一些危险信号。
缺少特征值
如果您的数据集中有一个或多个特征缺少大量样本数据,那么这可能表明数据集中的某些关键特性未得到充分代表。
练习:检查您的理解情况
您要训练一个模型,以便根据各种特征(包括品种、年龄、体重、性情和每天脱落的毛发数量)预测救助犬的适合度。您的目标是确保模型对所有类型的狗狗都具有相同的效果,无论它们的身体或行为特征如何
您发现训练集中的 5,000 个示例中有 1,500 个缺少气质值。以下哪些是潜在的偏见来源,您应该调查?
某些犬种更有可能缺少性情数据。
如果狗的品种与是否有性情数据相关,那么这可能会导致系统对某些狗狗的适合领养程度预测不准确。
年龄在 12 个月以下的狗狗更可能缺少温度数据
如果性情数据的可用性与年龄相关,那么与成年犬相比,幼犬的可领养性预测结果可能会不太准确。
所有从大城市救出的狗狗的性情数据都缺失。
乍一看,这似乎并不是潜在的偏见来源,因为缺失的数据会对大城市的所有狗狗造成同等的影响,而不考虑其品种、年龄、体重等。不过,我们仍然需要考虑狗狗的来源地是否可有效代表这些身体特征。例如,如果来自大城市的狗狗比来自更多农村地区的狗狗要小得多,则可能会导致对低体重狗或某些小狗品种的领养性预测不太准确。
数据集中随机缺少性情数据。
如果气质数据确实是随机缺失的,那么这不会成为潜在的偏差来源。不过,情绪数据可能会随机缺失,但进一步调查可能会揭示差异的原因。因此,请务必进行全面检查,排除其他可能性,而不是假定数据缺口是随机的。
特征值异常
在探索数据时,您还应查找包含明显非典型或异常特征值的样本。这些意外的特征值可能表明数据收集期间出现了问题,或者存在其他可能导致偏差的不准确性。
练习:检查您的理解情况
查看以下用于训练救援狗可领性模型的以下假设示例集。
品种 |
年龄(年) |
重量(磅) |
性情 |
shedding_level |
玩具贵宾犬 |
2 |
12 |
易激动 |
低价 |
金毛寻回犬 |
7 |
65 |
平静 |
高价 |
拉布拉多猎犬 |
35 |
73 |
平静 |
高价 |
法国斗牛犬 |
0.5 |
11 |
平静 |
medium |
未知混血 |
4 |
45 |
兴奋 |
高价 |
巴吉度猎犬 |
9 |
48 |
平静 |
medium |
您能否发现特征数据存在任何问题?
点击此处查看答案
品种 |
年龄(年) |
重量(磅) |
性情 |
shedding_level |
玩具贵宾犬 |
2 |
12 |
易激动 |
低价 |
金毛寻回犬 |
7 |
65 |
平静 |
高价 |
拉布拉多猎犬 |
35 |
73 |
平静 |
高价 |
法国斗牛犬 |
0.5 |
11 |
平静 |
medium |
未知混血 |
4 |
45 |
兴奋 |
高价 |
巴吉度猎犬 |
9 |
48 |
平静 |
medium |
吉尼斯世界纪录认证过的最长寿狗是澳大利亚牧牛犬 Bluey,它活了 29 年零 5 个月。鉴于此,这只拉布拉多犬实际上 35 岁的说法似乎不太可信,更有可能是狗的年龄计算或记录有误(狗可能实际上只有 3.5 岁)。此错误还可能表明数据集中的年龄数据存在更广泛的准确性问题,需要进一步调查。
数据倾斜
数据倾斜是指相对于实际的发生率而言,某些小组或特性未得到充分代表或得到过度代表;如果您的数据存在任何形式的倾斜,便可能会在模型中引入偏差。
审核模型性能时,不仅要按汇总查看结果,还要按子组查看结果。例如,在我们的救助犬领养资格模型中,仅查看整体准确性不足以确保公平性。我们还应按子群组审核效果,以确保模型对每种狗狗品种、年龄段和大小群组的预测效果均不相上下。
在本单元稍后的评估偏差部分,我们将详细介绍按子群组评估模型的不同方法。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2024-11-10。
[null,null,["最后更新时间 (UTC):2024-11-10。"],[[["\u003cp\u003eTraining data should represent real-world prevalence to avoid bias in machine learning models.\u003c/p\u003e\n"],["\u003cp\u003eMissing or unexpected feature values in the dataset can be indicative of potential sources of bias.\u003c/p\u003e\n"],["\u003cp\u003eData skew, where certain groups are under- or over-represented, can introduce bias and should be addressed.\u003c/p\u003e\n"],["\u003cp\u003eEvaluating model performance by subgroup ensures fairness and equal performance across different characteristics.\u003c/p\u003e\n"],["\u003cp\u003eAuditing for bias requires a thorough review of data and model outcomes to mitigate potential negative impacts.\u003c/p\u003e\n"]]],[],null,["# Fairness: Identifying bias\n\nAs you prepare your data for model training and evaluation, it's important to\nkeep issues of fairness in mind and audit for potential sources of\n[**bias**](/machine-learning/glossary#bias-ethicsfairness), so you can\nproactively mitigate its effects before releasing your model into production.\n\nWhere might bias lurk? Here are some red flags to look out for in your dataset.\n\nMissing feature values\n----------------------\n\nIf your dataset has one or more features that have missing values for a large\nnumber of examples, that could be an indicator that certain key characteristics\nof your dataset are under-represented.\n\n### Exercise: Check your understanding\n\nYou're training a model to predict adoptability of rescue dogs based on a variety of features, including breed, age, weight, temperament, and quantity of fur shed each day. Your goal is to ensure the model performs equally well on all types of dogs, irrespective of their physical or behavioral characteristics \n\n\u003cbr /\u003e\n\nYou discover that 1,500 of the 5,000 examples in the training set are\nmissing temperament values. Which of the following are potential sources\nof bias you should investigate? \nTemperament data is more likely to be missing for certain breeds of dogs. \nIf the availability of temperament data correlates with dog breed, then this might result in less accurate adoptability predictions for certain dog breeds. \nTemperament data is more likely to be missing for dogs under 12 months in age \nIf the availability of temperament data correlates with age, then this might result in less accurate adoptability predictions for puppies versus adult dogs. \nTemperament data is missing for all dogs rescued from big cities. \nAt first glance, it might not appear that this is a potential source of bias, since the missing data would affect all dogs from big cities equally, irrespective of their breed, age, weight, etc. However, we still need to consider that the location a dog is from might effectively serve as a proxy for these physical characteristics. For example, if dogs from big cities are significantly more likely to be smaller than dogs from more rural areas, that could result in less accurate adoptability predictions for lower-weight dogs or certain small-dog breeds. \nTemperament data is missing from the dataset at random. \nIf temperament data is truly missing at random, then that would not be a potential source of bias. However, it's possible temperament data might appear to be missing at random, but further investigation might reveal an explanation for the discrepancy. So it's important to do a thorough review to rule out other possibilities, rather than assume data gaps are random.\n\nUnexpected feature values\n-------------------------\n\nWhen exploring data, you should also look for examples that contain feature values\nthat stand out as especially uncharacteristic or unusual. These unexpected feature\nvalues could indicate problems that occurred during data collection or other\ninaccuracies that could introduce bias.\n\n### Exercise: Check your understanding\n\nReview the following hypothetical set of examples for training a rescue-dog\nadoptability model.\n\n| breed | age (yrs) | weight (lbs) | temperament | shedding_level |\n|---------------------|-----------|--------------|-------------|----------------|\n| toy poodle | 2 | 12 | excitable | low |\n| golden retriever | 7 | 65 | calm | high |\n| labrador retriever | 35 | 73 | calm | high |\n| french bulldog | 0.5 | 11 | calm | medium |\n| unknown mixed breed | 4 | 45 | excitable | high |\n| basset hound | 9 | 48 | calm | medium |\n\nCan you identify any problems with the feature data? \nClick here to see the answer \n\n| breed | age (yrs) | weight (lbs) | temperament | shedding_level |\n|---------------------|-----------|--------------|-------------|----------------|\n| toy poodle | 2 | 12 | excitable | low |\n| golden retriever | 7 | 65 | calm | high |\n| labrador retriever | 35 | 73 | calm | high |\n| french bulldog | 0.5 | 11 | calm | medium |\n| unknown mixed breed | 4 | 45 | excitable | high |\n| basset hound | 9 | 48 | calm | medium |\n\nThe oldest dog to have their age verified by *Guinness World Records*\nwas [Bluey](https://wikipedia.org/wiki/Bluey_(long-lived_dog)),\nan Australian Cattle Dog who lived to be 29 years and 5 months. Given that, it\nseems quite implausible that the labrador retriever is actually 35 years old,\nand more likely that the dog's age was either calculated or recorded\ninaccurately (maybe the dog is actually 3.5 years old). This error could\nalso be indicative of broader accuracy issues with age data in the dataset\nthat merit further investigation.\n\nData skew\n---------\n\nAny sort of skew in your data, where certain groups or characteristics may be\nunder- or over-represented relative to their real-world prevalence, can\nintroduce bias into your model.\n\nWhen auditing model performance, it's important not only to look at results in\naggregate, but to break out results by subgroup. For example, in the case of\nour rescue-dog adoptability model, to ensure fairness, it's not sufficient to\nsimply look at overall accuracy. We should also audit performance by subgroup\nto ensure the model performs equally well for each dog breed, age group, and\nsize group.\n\nLater in this module, in [Evaluating for Bias](/machine-learning/crash-course/fairness/evaluating-for-bias), we'll\ntake a closer look at different methods for evaluating models by subgroup.\n| **Key terms:**\n|\n- [Bias (ethics/fairness)](/machine-learning/glossary#bias-ethicsfairness) \n[Help Center](https://support.google.com/machinelearningeducation)"]]