A new and improved version of Machine Learning Crash Course is coming in August 2024. Stay tuned!

Classification

This module shows how logistic regression can be used for classification tasks, and explores how to evaluate the effectiveness of classification models.

Classification

Classification vs. Regression

Sometimes, we use logistic regression for the probability outputs -- this is a regression in (0, 1)
Other times, we'll threshold the value for a discrete binary classification
Choice of threshold is an important choice, and can be tuned

Evaluation Metrics: Accuracy

How do we evaluate classification models?

Evaluation Metrics: Accuracy

How do we evaluate classification models?
One possible measure: Accuracy
- the fraction of predictions we got right

Accuracy Can Be Misleading

In many cases, accuracy is a poor or misleading metric
- Most often when different kinds of mistakes have different costs
- Typical case includes class imbalance, when positives or negatives are extremely rare

True Positives and False Positives

For class-imbalanced problems, useful to separate out different kinds of errors

True Positives We correctly called wolf! We saved the town.	False Positives Error: we called wolf falsely. Everyone is mad at us.
False Negatives There was a wolf, but we didn't spot it. It ate all our chickens.	True Negatives No wolf, no alarm. Everyone is fine.

Evaluation Metrics: Precision and Recall

Precision: (True Positives) / (All Positive Predictions)

When model said "positive" class, was it right?
Intuition: Did the model cry "wolf" too often?

Evaluation Metrics: Precision and Recall

Precision: (True Positives) / (All Positive Predictions)

When model said "positive" class, was it right?
Intuition: Did the model cry "wolf" too often?

Recall: (True Positives) / (All Actual Positives)

Out of all the possible positives, how many did the model correctly identify?
Intuition: Did it miss any wolves?

When you have finished, press play ▶ to continue

Explore the options below.

Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to precision?

Definitely increase.

Raising the classification threshold typically increases precision; however, precision is not guaranteed to increase monotonically as we raise the threshold.

Probably increase.

In general, raising the classification threshold reduces false positives, thus raising precision.

Probably decrease.

In general, raising the classification threshold reduces false positives, thus raising precision.

Definitely decrease.

In general, raising the classification threshold reduces false positives, thus raising precision.

A ROC Curve

Each point is the TP and FP rate at one decision threshold.

ROC Curve showing TP Rate vs. FP Rate at different classification thresholds.

Evaluation Metrics: AUC

AUC: "Area under the ROC Curve"

Evaluation Metrics: AUC

AUC: "Area under the ROC Curve"
Interpretation:

If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?

Evaluation Metrics: AUC

AUC: "Area under the ROC Curve"
Interpretation:

If we pick a random positive and a random negative, what's the probability my model ranks them in the correct order?

Intuition: gives an aggregate measure of performance aggregated across all possible classification thresholds

Prediction Bias

Logistic Regression predictions should be unbiased.

average of predictions == average of observations

Prediction Bias

Logistic Regression predictions should be unbiased.

average of predictions == average of observations

Bias is a canary.

Zero bias alone does not mean everything in your system is perfect.
But it's a great sanity check.

Prediction Bias (continued)

If you have bias, you have a problem.

Incomplete feature set?
Buggy pipeline?
Biased training sample?

Don't fix bias with a calibration layer, fix it in the model.
Look for bias in slices of data -- this can guide improvements.

Calibration Plots Show Bucketed Bias

Loss and Regularization

Thresholding