Help Your Future-Self Succeed with ML Monitoring and Retraining
It’s worth it, trust me 🙂
Seriously, your future self will be thanking you for setting up a well defined pipeline around this stuff when it inevitably comes time to retrain the model. In the same way that a CI/CD (Continuous Integration / Continuous Deployment) pipeline is helpful for reproducibility, a CI/CT/CD/CM (those new ones are Continuous Training and Continuous Monitoring) will ensure consistency in your model, minimize concept drift, increase reproducibility, and increase portability (should you need to change where you train or inference).
The first step is to make sure you have a system in place to monitor your model in production to catch data format issues and model performance degradation (how often is it making a bad prediction?). Once you have metrics, you can set up automation to trigger if a threshold is passed. For example, is your random number generator model suddenly returning lots of ones? Maybe there’s a problem there. Probably time for retraining (and a larger dataset). If you care about model accuracy you can use DVC (Data Version Control) so that you know when the dataset has changed and it’s time to retrain
Ideally, you’re training whenever your training data changes…but that’s also expensive. So setting a higher model performance threshold for conditions which warrant training and only then pulling in a new dataset is likely advisable if you are on a budget. When you’re ready to train it’s off to tools like Kubeflow, Vertex AI Pipelines, or Ray to orchestrate a training session, and handle the metadata collection.
Similar to Continuous Delivery and other aspects of DevOps, you’ll need to get buy-in from your organization to really make Continuous Training worthwhile. It’s also important to remember that automation isn’t everything – you’re still going to want to have someone on call who can take a look at things and see if the model needs to be changed (and not just retrained).
By implementing automation, you ensure organizational continuity, continued model relevance, and a better experience for your customers. It’s usually worth the cost!
Building a Sandcastle on the Tideline: Embracing change with ML Ops
If the world hadn’t changed, my model would be amazing!
I see machine learning models a lot like vaccine boosters. Let me explain.
The way vaccine reformulation works is that we take the vaccine that took years to develop (model development), and slightly change it to address the current environment it operates within (new data). We make the changes needed to stand up against the current most popular viral variant…only to manufacture it (train it) for a few months. When the vaccine (model) finally emerges, it finds itself in a world very different from the one it was designed (trained) for. It still does an admirable job, and it’s better than nothing, but the variant it was designed for is likely no longer the dominant one. Imagine how effective the vaccine could be if the cycle time from modification to production was shortened! That extended manufacturing/training time has resulted in a model that is already out of date. This is called concept drift.
I used to think they just train and “launch” ML models and then let them run like most software, dealing with operational issues and such but for the most part letting the already written software do the work. Wrong. AI models are not like regular software. They constantly need to be updated. If you’re not constantly monitoring for drift and retraining with new data, you’ll quickly fall behind. Of course, there are ways to make the impact of concept drift feel less drastic (like allowing extensions that pull in real, live data). But when it comes to making predictions and formulating ideas on research, you need a model that knows the latest way to be 😎
How do you solve this?? Manually training a model is fine in the initial prototyping stage, but you can’t ship something like that. If you did, you would be managing the model lifecycle by hand forever which isn’t sustainable. Beyond an initial local experiment in a notebook, you need to be thinking about pipelines and automation from the beginning. Run the training pipeline whenever there is new data – this is called Continuous Training (CT). The automated process can do supervised training and update the model weights for you.
The industry throws the term model weights around a lot and I think it deserves more of an explanation. The concept of model weights actually comes from the theory of Hebbian plasticity in the brain (a form of synaptic plasticity that basically says “neurons that fire together, wire together”). The way the theory goes, is that if two neurons in the brain are working together often enough they will strengthen the connection between themselves for higher bandwidth transmission (just like we do on our roads when there is too much traffic at rush hour). This strengthened bond ends up re-arranging our neurons, and it’s how we learn! 🧠
Machine learning models do the same thing. During supervised learning, they keep track of how often the model makes a bad prediction. This is measured against the dataset. If I ask the model what Jane’s favorite color is, it better get the right answer if that answer is explicitly spelled out in the dataset it was trained on. If it gets the answer wrong, the weight is reduced. When it gets the answer right, it increases the weight, just like our brains strengthen neural connections. This is where the term neural network comes from.
I’ll do a post soon that goes into detail on getting that magical ML retraining pipeline into place, but for now just remember that automation is key (always be training) and that models have to learn and stay up to date, just like you do.