Production ML systems: Static versus dynamic training

Broadly speaking, you can train a model in either of two ways:

  • Static training (also called offline training) means that you train a model only once. You then serve that same trained model for a while.
  • Dynamic training (also called online training) means that you train a model continuously or at least frequently. You usually serve the most recently trained model.
Figure 2. Raw dough creates three identical loaves of bread.
Figure 2. Static training. Train once; serve the same built model multiple times. (Images by Pexels and by fancycrave1.)

 

Figure 3. Raw dough creates slightly different loaves of bread
            each time.
Figure 3. Dynamic training. Retrain frequently; serve the most recently built model. (Images by Pexels and by Couleur.)

 

Table 1. Primary advantages and disadvantages.

Static training Dynamic training
Advantages Simpler. You only need to develop and test the model once. More adaptable. Your model will keep up with any changes to the relationship between features and labels.
Disadvantages Sometimes staler. If the relationship between features and labels changes over time, your model's predictions will degrade. More work. You must build, test, and release a new product all the time.

If your dataset truly isn't changing over time, choose static training because it is cheaper to create and maintain than dynamic training. However, datasets tend to change over time, even those with features that you think are as constant as, say, sea level. The takeaway: even with static training, you must still monitor your input data for change.

For example, consider a model trained to predict the probability that users will buy flowers. Because of time pressure, the model is trained only once using a dataset of flower buying behavior during July and August. The model works fine for several months but then makes terrible predictions around Valentine's Day because user behavior during that floral holiday period changes dramatically.

For a more detailed exploration of static and dynamic training, see the Managing ML Projects course.

Exercises: Check your understanding

Which two of the following statements are true about static (offline) training?
The model stays up to date as new data arrives.
Actually, if you train offline, then the model has no way to incorporate new data as it arrives. This can lead to model staleness, if the distribution you are trying to learn from changes over time.
You can verify the model before applying it in production.
Yes, offline training gives ample opportunity to verify model performance before introducing the model in production.
Offline training requires less monitoring of training jobs than online training.
In general, monitoring requirements at training time are more modest for offline training, which insulates you from many production considerations. However, the more frequently you train your model, the higher the investment you'll need to make in monitoring. You'll also want to validate regularly to ensure that changes to your code (and its dependencies) don't adversely affect model quality.
Very little monitoring of input data needs to be done at inference time.
Counterintuitively, you do need to monitor input data at serving time. If the input distributions change, then our model's predictions may become unreliable. Imagine, for example, a model trained only on summertime clothing data suddenly being used to predict clothing buying behavior in wintertime.
Which one of the following statements is true of dynamic (online) training?
The model stays up to date as new data arrives.
This is the primary benefit of online training; you can avoid many staleness issues by allowing the model to train on new data as it comes in.
Very little monitoring of training jobs needs to be done.
Actually, you must continuously monitor training jobs to ensure that they are healthy and working as intended. You'll also need supporting infrastructure like the ability to roll a model back to a previous snapshot in case something goes wrong in training, such as a buggy job or corruption in input data.
Very little monitoring of input data needs to be done at inference time.
Just like a static, offline model, it is also important to monitor the inputs to the dynamically updated models. You are likely not at risk for large seasonality effects, but sudden, large changes to inputs (such as an upstream data source going down) can still cause unreliable predictions.