Fairness: Counterfactual fairness

Thus far, our discussions of fairness metrics have assumed that our training and test examples contain comprehensive demographic data for the demographic subgroups being evaluated. But often this isn't the case.

Suppose our admissions dataset doesn't contain complete demographic data. Instead, demographic-group membership is recorded for just a small percentage of examples, such as students who opted to self-identify which group they belonged to. In this case, the breakdown of our candidate pool into accepted and rejected students now looks like this:

A candidate pool of 100 students, split into two groups:
Rejected Candidates (80 student icons) and Accepted Candidates (20
student icons). All of the icons are shaded gray (signifying that their
demographic group is unknown), except for 6 icons. In the Rejected
group, two student icons are shaded blue and two student icons are shaded
orange. In the Accepted group, one student icon is shaded blue and one is
shaded orange. — **Figure 5.** Candidate pool, with demographic-group membership unknown for nearly all candidates (icons shaded in gray).

It's not feasible here to evaluate model predictions for either demographic parity or equality of opportunity, because we don't have demographic data for 94% of our examples. However, for the 6% of examples that do contain demographic features, we can still compare pairs of individual predictions (a majority candidate vs. a minority candidate) and see if they have been treated equitably by the model.

For example, let's say that we've thoroughly reviewed the feature data available for two candidates (one in the majority group and one in the minority group, annotated with a star in the image below), and have determined that they are identically qualified for admission in all respects. If the model makes the same prediction for both of these candidates (i.e., either rejects both candidates or accepts both candidates), it is said to satisfy counterfactual fairness for these examples. Counterfactual fairness stipulates that two examples that are identical in all respects, except a given sensitive attribute (here, demographic group membership), should result in the same model prediction.

Same candidate pool as in the previous image, except in
this version, one blue student icon (belonging to the majority group) and
one orange student icon (belonging to the minority group) in the Rejected
group are annotated with a star, indicating that these two candidates are
identical (aside from demographic group). — **Figure 6.** Counterfactual fairness is satisfied for the two identical examples (only varying in demographic group membership) annotated with a star, as the model makes the same decision for both (Rejected).

Benefits and drawbacks

As mentioned earlier, one key benefit of counterfactual fairness is that it can be used to evaluate predictions for fairness in many cases where using other metrics wouldn't be feasible. If a dataset doesn't contain a full set of feature values for the relevant group attributes under consideration, it won't be possible to evaluate fairness using demographic parity or equality of opportunity. However, if these group attributes are available for a subset of examples, and it's possible to identify comparable pairs of equivalent examples in different groups, practitioners can use counterfactual fairness as a metric to probe the model for potential biases in predictions.

Additionally, because metrics like demographic parity and equality of opportunity assess groups in aggregate, they may mask bias issues that affect the model at the level of individual predictions, which can be surfaced by evaluation using counterfactual fairness. For example, suppose our admissions model accepts qualified candidates from the majority group and the minority group in the same proportion, but the most qualified minority candidate is rejected whereas the most qualified majority candidate who has the exact same credentials is accepted. A counterfactual fairness analysis can help identify these sorts of discrepancies so that they can be addressed.

On the flipside, the key downside of counterfactual fairness is that it doesn't provide as holistic a view of bias in model predictions. Identifying and remediating a handful of inequities in pairs of examples may not be sufficient to address systemic bias issues that affect entire subgroups of examples.

In cases where it's possible, practitioners can consider doing both an aggregate fairness analysis (using a metric like demographic parity or equality of opportunity) as well as a counterfactual fairness analysis to gain the widest range of insights into potential bias issues in need of remediation.

Exercise: Check your understanding

Exercise Figure. Two groups of circles: Negative
Predictions and Positive Predictions.
Negative Predictions consists of 50 circles:
39 gray circles, 8 blue circles, and 3 orange circles. One
blue circle is labeled 'A', one orange circle is labeled 'A',
and one blue circle is labeled 'C'.
Positive Predictions consists of 15 circles:
10 gray circles, 3 blue circles, and 2
orange circles. One blue circle is labeled 'B', one orange
circle is labeled 'B', and one blue circle is labeled 'C'.
A legend below the diagram states that blue circles represent
an example in the majority group, orange circles represent
an example in the minority group, and gray circles represent
examples whose group membership is unknown — **Figure 7.** Negative and Positive predictions for a batch of examples, with three pairs of examples labeled as A, B, and C.

In the set of predictions in Figure 7 above, which of the following pairs of identical (excluding group membership) examples received predictions that violate counterfactual fairness?

Pair A

Pair A's predictions satisfy counterfactual fairness, as both the example in the majority group (blue) and the example in the minority group (orange) received the same prediction (Negative).

Pair B

Pair B's predictions satisfy counterfactual fairness, as both the example in the majority group (blue) and the example in the minority group (orange) received the same prediction (Positive).

Pair C

Pair C's predictions are for two examples that both belong to the majority group (blue). The fact that the model produced different predictions for these identical examples suggests that there may be broader performance issues with the model that should be investigated. However, this result does not violate counterfactual fairness, whose conditions only apply in cases where the two identical examples are each drawn from different groups.

None of these pairs violate counterfactual fairness

The predictions for Pairs A and B satisfy counterfactual fairness because in both cases, the example in the majority group and the example in the minority group receive the same prediction. Pair C's examples both belong to the same group (the majority group), so counterfactual fairness is not applicable in this case.

Summary

Demographic parity, equality of opportunity, and counterfactual fairness each provide different mathematical definitions of fairness for model predictions. And those are just three possible ways to quantify fairness. Some definitions of fairness are even mutually incompatible, meaning it may be impossible to satisfy them simultaneously for a given model's predictions.

So how do you choose the "right" fairness metric for your model? You need to consider the context in which it's being used and the overarching goals you want to accomplish. For example, is the goal to achieve equal representation (in this case, demographic parity may be the optimal metric) or is it to achieve equal opportunity (here, equality of opportunity may be the best metric)?

To learn more about ML Fairness and explore these issues in more depth, see Fairness and Machine Learning: Limitations and Opportunities by Solon Barocas, Moritz Hardt, and Arvind Narayanan.

Equality of opportunity (10 min)

Programming exercise (40 min)