Earlier, you encountered binary classification models
that could pick between one of two possible choices, such as whether:
A given email is spam or not spam.
A given tumor is malignant or benign.
In this module, we'll investigate multi-class classification, which
can pick from multiple possibilities. For example:
Is this dog a beagle, a basset hound, or a bloodhound?
Is this flower a Siberian Iris, Dutch Iris, Blue Flag Iris,
or Dwarf Bearded Iris?
Is that plane a Boeing 747, Airbus 320, Boeing 777, or Embraer 190?
Is this an image of an apple, bear, candy, dog, or egg?
Some real-world multi-class problems entail choosing from millions
of separate classes. For example, consider a multi-class classification
model that can identify the image of just about anything.
Multi-Class Neural Networks
More than two classes?
Logistic regression gives useful probabilities for binary-class problems.
spam / not-spam
click / not-click
What about multi-class problems?
apple, banana, car, cardiologist, ..., walk sign, zebra, zoo
red, orange, yellow, green, blue, indigo, violet
animal, vegetable, mineral
One-Vs-All Multi-Class
Create a unique output for each possible class
Train that on a signal of "my class" vs "all other classes"
Can do in a deep network, or with separate models
SoftMax Multi-Class
Add an additional constraint: Require output of all one-vs-all nodes to sum to 1.0
The additional constraint helps training converge quickly
Plus, allows outputs to be interpreted as probabilities
What to use When?
Multi-Class, Single-Label Classification:
An example may be a member of only one class.
Constraint that classes are mutually exclusive is helpful structure.
Useful to encode this in the loss.
Use one softmax loss for all possible classes.
Multi-Class, Multi-Label Classification:
An example may be a member of more than one class.
No additional constraints on class membership to exploit.
One logistic regression loss for each possible class.
SoftMax Options
Full SoftMax
Brute force; calculates for all classes.
SoftMax Options
Full SoftMax
Brute force; calculates for all classes.
Candidate Sampling
Calculates for all the positive labels, but only for a random sample of negatives.