Fairness: Test Your Knowledge

  1. True or false: Historical bias occurs when a model is trained on old data.

  2. Engineers are training a regression model to predict the calorie content of meals based on a variety of feature data they've scraped from recipe websites around the world, including serving size, ingredients, and preparation techniques. Which of the following data issues are potential sources of bias that should be investigated further?

    Choose as many answers as you see fit.

  3. A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic"):

    Adults

    True Positives (TPs): 512 False Positives (FPs): 51
    False Negatives (FNs): 36 True Negatives (TNs): 9401
    Precision = TP/(TP + FP) = 0.909
    Recall = TP/(TP + FN) = 0.934

    Minors

    True Positives (TPs): 2147 False Positives (FPs): 96
    False Negatives (FNs): 2177 True Negatives (TNs): 5580
    Precision = TP/(TP + FP) = 0.957
    Recall = TP/(TP + FN) = 0.497

    Which of the following statements about the model's test-set performance are true?

    Choose as many answers as you see fit.

  4. Which of the following hypotheses could explain the discrepancies in subgroup performance on the test set for the sarcasm-detection model above?

    Choose as many answers as you see fit.

  5. Engineers are working on retraining the sarcasm model above to address inconsistencies in sarcasm-detection accuracy across age demographics, but the model has already been released into production. Which of the following stopgap strategies will help mitigate errors in the model's predictions?