Introduction
This module begins with a leading question. Choose one of the following answers:
If you had to prioritize improving one of the following areas
in your machine learning project, which would have the most
impact?
Improving the quality of your dataset
Data trumps all.
The quality and size of the dataset matters much more than which
shiny algorithm you use to build your model.
Applying a more clever loss function to training your model
True, a better loss function can help a model train faster, but
it's still a distant second to another item in this list.
And here's an even more leading question:
Take a guess: In your machine learning project, how much time
do you typically spend on data preparation and transformation?
More than half of the project time
Yes, ML practitioners spend the majority of their time
constructing datasets and doing feature engineering.
Less than half of the project time
Plan for more! Typically, 80% of the time on a machine learning
project is spent constructing datasets and transforming data.
In this module, you'll learn more about the characteristics of machine learning datasets, and how to prepare your data to ensure high-quality results when training and evaluating your model.