This course is out of date. We will remove this course in July 2024.
For the following questions, click on your selection to expand and check your
answer.
Modeling Approach
You and your friend Mel like unicorns. In fact, you like unicorns so much, you
decide to predict unicorn appearances using ... machine learning. You
have a dataset of 10,000 unicorn appearances. For each appearance, the
dataset contains the location, time of day, elevation, temperature,
humidity, population density, tree cover, presence of a rainbow,
and many other features.
You want to start developing your ML model. Which one of the following
approaches is a good way to start development?
Unicorns often appear at dawn and dusk. Therefore, use the feature
"time of day" to create a linear model.
Correct. A linear model that uses one or two highly-predictive
features is an effective way to start.
Predicting unicorn appearances is a very hard problem.
Therefore, use a deep neural network with all available features.
Incorrect. Starting with a complex model will complicate debugging.
Start with a simple linear model but use all the features to
ensure the simple model has predictive power.
Incorrect. If you use lots of the features, even with a linear model,
then the resulting model is complex and hard to debug.
Baselines
Using regression with mean square error (MSE) loss, you are predicting the
cost of a taxi ride using the ride's duration, distance, origin, and end.
You know:
- Mean ride cost is $15.
- Ride cost increases by a fixed amount per kilometer.
- Rides within the downtown area are charged extra.
- Rides start at a minimum cost of $3.
Determine whether the following baselines are useful.
Is this a useful baseline: Every ride costs $15.
Yes
Correct. The mean cost is a useful baseline.
No
Incorrect. Always predicting the mean
results in a lower MSE than always predicting any other
value. Therefore, testing a model against this baseline
provides a meaningful comparison.
It depends on what the standard deviation of the ride cost is.
Incorrect. Irrespective of the standard deviation,
the mean cost of the ride is a useful baseline because
always predicting the mean results in a lower MSE
when compared to always predicting any other value.
Is this a useful baseline: A trained model that uses only
duration and origin as features.
Yes
Incorrect. You should only use a trained model as a baseline after
the model is fully validated in production. Furthermore, the
trained model should itself be validated against a simpler baseline.
No
Correct. You should only use a trained model as a baseline after
the model is fully validated in production.
Is this a useful baseline: A ride's cost is the ride distance
(in kilometers) multiplied by the fare per kilometer.
Yes
Correct. Distance is the most important factor in
determining ride cost. Therefore, a baseline that relies on
distance is useful.
No
Incorrect. Distance is the most important factor in
determinig ride cost. Therefore, a baseline that relies on
distance is useful.
Is this a useful baseline: Every ride costs $1. Because the
model must always beat this baseline. If the model does not
beat this baseline, then we can be certain that the model has a bug.
Yes
Incorrect. This is not a useful baseline because it is
always wrong. Comparing a model against a baseline that is always
wrong is not meaningful.
No
Correct. This baseline is not a useful test of your model.
Hyperparameters
The following questions describe problems in training a classifier.
Choose actions that could fix the problem described.
Training loss is 0.24 and validation loss is 0.36. Which two of the
following actions could reduce the difference between
training and validation loss?
Ensure the training and validation sets have the same statistical
properties.
Correct. If the training and validation sets have different
statistical properties, then the training data will
not help predict the validation data.
Use regularization to prevent overfitting.
Correct. If the training loss is smaller than the validation loss,
then your model is probably overfitting to the training data.
Regularization prevents overfitting.
Increase the number of training epochs.
Incorrect. If the training loss is smaller than the validation loss,
then your model is typically overfitting to the training data.
Increasing training epochs will only increase overfitting.
Decrease the learning rate.
Incorrect. Having a validation loss that is greater than the
training loss typically indicates overfitting. Changing the learning rate
does not reduce overfitting.
You perform the correct actions described in the previous question,
and now your training and validation losses decrease from 1.0 to
roughly 0.24 after training for many epochs. Which one of the
following actions could reduce your training loss further?
Increase the depth and width of your neural network.
Correct. If your training loss stays constant at
0.24 after training for many epochs, then your model might lack
the predictive ability to further lower loss. Increasing the
model's depth and width could give the model the additional
predictive ability required to reduce the training loss further.
Increase the number of training epochs.
Incorrect. If your training loss stays at 0.24 after training for many
epochs, then continuing to train the model will probably not
cause the training loss to decrease significantly.
Increasing the learning rate.
Incorrect. Given that training loss did not decrease for many
training epochs, increasing the learning rate will probably not lower the
final training loss. Instead, increasing the
learning rate could make your training unstable and prevent
your model from learning the data.
You take the correct action in the previous question. Your model's
training loss decreased to 0.20. Assume you need to reduce your model's
training loss a little more. You add a few features that appear to
have predictive power. However, training loss continues to fluctuate
around 0.20. Which three of the following options could
reduce your training loss?
Increase the depth and width of your layers.
Correct. Your model might lack the capacity to learn the
predictive signals in the new features.
Increase the training epochs.
Incorrect. If your model's training loss is fluctuating around 0.20,
then increasing the number of training epochs will probably cause
the model's training loss to continue fluctuating around 0.20.
The features don't add information relative to
existing features. Try a different feature.
Correct. It is possible that the predictive signals encoded by the
features already exist in the features that you are using.
Decrease the learning rate.
Correct. It is possible that adding the new features made the
problem more complex. Specifically, fluctuation in loss indicates
that the learning rate is too high and your model is jumping around
the minima. Decreasing your learning rate will let your model
learn the minima.