Neural networks

You may recall from the Feature cross exercises in the Categorical data module, that the following classification problem is nonlinear:

Figure 1. Cartesian coordinate plane, divided into four
      quadrants, each filled with random dots in a shape resembling a
      square. The dots in the top-right and bottom-leftquadrants are blue,
      and the dots in the top-left and bottom-right quadrants are orange.
Figure 1. Nonlinear classification problem. A linear function cannot cleanly separate all the blue dots from the orange dots.

"Nonlinear" means that you can't accurately predict a label with a model of the form \(b + w_1x_1 + w_2x_2\). In other words, the "decision surface" is not a line.

However, if we perform a feature cross on our features $x_1$ and $x_2$, we can then represent the nonlinear relationship between the two features using a linear model: $b + w_1x_1 + w_2x_2 + w_3x_3$ where $x_3$ is the feature cross between $x_1$ and $x_2$:

Figure 2. The same Cartesian coordinate plane of blue and orange
      dots as in Figure 1.  However, this time a white hyperbolic curve is
      plotted atop the grid, which separates the blue dots in the top-right
      and bottom-left quadrants (now shaded with a blue background) from
      the orange dots in the top-left and bottom right quadrants (now
      shaded with an orange background).
Figure 2. By adding the feature cross x1x2, the linear model can learn a hyperbolic shape that separates the blue dots from the orange dots.

Now consider the following dataset:

Figure 3. Cartesian coordinate plane, divided into four quadrants.
      A circular cluster of blue dots is centered at the origin of the
      graph, and is surrounded by a ring of orange dots.
Figure 3. A more difficult nonlinear classification problem.

You may also recall from the Feature cross exercises that determining the correct feature crosses to fit a linear model to this data took a bit more effort and experimentation.

But what if you didn't have to do all that experimentation yourself? Neural networks are a family of model architectures designed to find nonlinear patterns in data. During training of a neural network, the model automatically learns the optimal feature crosses to perform on the input data to minimize loss.

In the following sections, we'll take a closer look at how neural networks work.