Production ML systems: Questions to ask

This lesson focuses on the questions you should ask about your data and model in production systems.

Is each feature helpful?

You should continuously monitor your model to remove features that contribute little or nothing to the model's predictive ability. If the input data for that feature abruptly changes, your model's behavior might also abruptly change in undesirable ways.

Also consider the following related question:

  • Does the usefulness of the feature justify the cost of including it?

It is always tempting to add more features to the model. For example, suppose you find a new feature whose addition makes your model's predictions slightly better. Slightly better predictions certainly seem better than slightly worse predictions; however, the extra feature adds to your maintenance burden.

Is your data source reliable?

Some questions to ask about the reliability of your input data:

  • Is the signal always going to be available or is it coming from an unreliable source? For example:
    • Is the signal coming from a server that crashes under heavy load?
    • Is the signal coming from humans that go on vacation every August?
  • Does the system that computes your model's input data ever change? If so:
    • How often?
    • How will you know when that system changes?

Consider creating your own copy of the data you receive from the upstream process. Then, only advance to the next version of the upstream data when you are certain that it is safe to do so.

Is your model part of a feedback loop?

Sometimes a model can affect its own training data. For example, the results from some models, in turn, become (directly or indirectly) input features to that same model.

Sometimes a model can affect another model. For example, consider two models for predicting stock prices:

  • Model A, which is a bad predictive model.
  • Model B.

Since Model A is buggy, it mistakenly decides to buy stock in Stock X. Those purchases drive up the price of Stock X. Model B uses the price of Stock X as an input feature, so Model B can come to some false conclusions about the value of Stock X. Model B could, therefore, buy or sell shares of Stock X based on the buggy behavior of Model A. Model B's behavior, in turn, can affect Model A, possibly triggering a tulip mania or a slide in Company X's stock.

Exercise: Check your understanding

Which three of the following models are susceptible to a feedback loop?
A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.
A university-ranking model that rates schools in part by their selectivity—the percentage of students who applied that were admitted.
A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.
A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.
A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.