Datasets, generalization, and overfitting

Introduction

This module begins with a leading question. Choose one of the following answers:

If you had to prioritize improving one of the following areas in your machine learning project, which would have the most impact?
Improving the quality of your dataset
Applying a more clever loss function to training your model

And here's an even more leading question:

Take a guess: In your machine learning project, how much time do you typically spend on data preparation and transformation?
More than half of the project time
Less than half of the project time

In this module, you'll learn more about the characteristics of machine learning datasets, and how to prepare your data to ensure high-quality results when training and evaluating your model.