You may recall from the Feature cross exercises in the Categorical data module, that the following classification problem is nonlinear:
data:image/s3,"s3://crabby-images/cab8a/cab8a35e523082c2396c9fdd48518dcdbf6d2d04" alt="Figure 1. Cartesian coordinate plane, divided into four
quadrants, each filled with random dots in a shape resembling a
square. The dots in the top-right and bottom-leftquadrants are blue,
and the dots in the top-left and bottom-right quadrants are orange."
"Nonlinear" means that you can't accurately predict a label with a model of the form . In other words, the "decision surface" is not a line.
However, if we perform a feature cross on our features and , we can then represent the nonlinear relationship between the two features using a linear model: where is the feature cross between and :
data:image/s3,"s3://crabby-images/eb16b/eb16bce3dffc2f2969d532320acb453398d641ce" alt="Figure 2. The same Cartesian coordinate plane of blue and orange
dots as in Figure 1. However, this time a white hyperbolic curve is
plotted atop the grid, which separates the blue dots in the top-right
and bottom-left quadrants (now shaded with a blue background) from
the orange dots in the top-left and bottom right quadrants (now
shaded with an orange background)."
Now consider the following dataset:
data:image/s3,"s3://crabby-images/afe2c/afe2c82288b039b0f99e7e4c623a0426ca4a7bf4" alt="Figure 3. Cartesian coordinate plane, divided into four quadrants.
A circular cluster of blue dots is centered at the origin of the
graph, and is surrounded by a ring of orange dots."
You may also recall from the Feature cross exercises that determining the correct feature crosses to fit a linear model to this data took a bit more effort and experimentation.
But what if you didn't have to do all that experimentation yourself? Neural networks are a family of model architectures designed to find nonlinear patterns in data. During training of a neural network, the model automatically learns the optimal feature crosses to perform on the input data to minimize loss.
In the following sections, we'll take a closer look at how neural networks work.