This module shows how logistic regression can be used for classification tasks, and explores how to evaluate the effectiveness of classification models.
Classification
Classification vs. Regression
- Sometimes, we use logistic regression for the probability outputs -- this is a regression in (0, 1)
- Other times, we'll threshold the value for a discrete binary classification
- Choice of threshold is an important choice, and can be tuned
Evaluation Metrics: Accuracy
- How do we evaluate classification models?
Evaluation Metrics: Accuracy
- How do we evaluate classification models?
- One possible measure: Accuracy
- the fraction of predictions we got right
Accuracy Can Be Misleading
- In many cases, accuracy is a poor or misleading metric
- Most often when different kinds of mistakes have different costs
- Typical case includes class imbalance, when positives or negatives are extremely rare
True Positives and False Positives
- For class-imbalanced problems, useful to separate out different kinds of errors
True Positives We correctly called wolf! We saved the town. |
False Positives Error: we called wolf falsely. Everyone is mad at us. |
False Negatives There was a wolf, but we didn't spot it. It ate all our chickens. |
True Negatives No wolf, no alarm. Everyone is fine. |
Evaluation Metrics: Precision and Recall
- Precision: (True Positives) / (All Positive Predictions)
- When model said "positive" class, was it right?
- Intuition: Did the model cry "wolf" too often?
Evaluation Metrics: Precision and Recall
- Precision: (True Positives) / (All Positive Predictions)
- When model said "positive" class, was it right?
- Intuition: Did the model cry "wolf" too often?
- Recall: (True Positives) / (All Actual Positives)
- Out of all the possible positives, how many did the model correctly identify?
- Intuition: Did it miss any wolves?
When you have finished, press play ▶ to continue
Explore the options below.
Consider a classification model that separates email into two categories:
"spam" or "not spam." If you raise the classification threshold, what will
happen to precision?
Definitely increase.
Raising the classification threshold typically increases precision;
however, precision is not guaranteed to increase monotonically
as we raise the threshold.
Probably increase.
In general, raising the classification threshold reduces false
positives, thus raising precision.
Probably decrease.
In general, raising the classification threshold reduces false
positives, thus raising precision.
Definitely decrease.
In general, raising the classification threshold reduces false
positives, thus raising precision.
A ROC Curve
Each point is the TP and FP rate at one decision threshold.
Evaluation Metrics: AUC
- AUC: "Area under the ROC Curve"
Evaluation Metrics: AUC
- AUC: "Area under the ROC Curve"
- Interpretation:
- If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?
Evaluation Metrics: AUC
- AUC: "Area under the ROC Curve"
- Interpretation:
- If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?
- Intuition: gives an aggregate measure of performance aggregated across all possible classification thresholds
Prediction Bias
- Logistic Regression predictions should be unbiased.
- average of predictions == average of observations
Prediction Bias
- Logistic Regression predictions should be unbiased.
- average of predictions == average of observations
- Bias is a canary.
- Zero bias alone does not mean everything in your system is perfect.
- But it's a great sanity check.
Prediction Bias (continued)
- If you have bias, you have a problem.
- Incomplete feature set?
- Buggy pipeline?
- Biased training sample?
- Don't fix bias with a calibration layer, fix it in the model.
- Look for bias in slices of data -- this can guide improvements.