I see machine learning models a lot like vaccine boosters. Let me explain.
The way vaccine reformulation works is that we take the vaccine that took years to develop (model development), and slightly change it to address the current environment it operates within (new data). We make the changes needed to stand up against the current most popular viral variant…only to manufacture it (train it) for a few months. When the vaccine (model) finally emerges, it finds itself in a world very different from the one it was designed (trained) for. It still does an admirable job, and it’s better than nothing, but the variant it was designed for is likely no longer the dominant one. Imagine how effective the vaccine could be if the cycle time from modification to production was shortened! That extended manufacturing/training time has resulted in a model that is already out of date. This is called concept drift.
I used to think they just train and “launch” ML models and then let them run like most software, dealing with operational issues and such but for the most part letting the already written software do the work. Wrong. AI models are not like regular software. They constantly need to be updated. If you’re not constantly monitoring for drift and retraining with new data, you’ll quickly fall behind. Of course, there are ways to make the impact of concept drift feel less drastic (like allowing extensions that pull in real, live data). But when it comes to making predictions and formulating ideas on research, you need a model that knows the latest way to be 😎
How do you solve this?? Manually training a model is fine in the initial prototyping stage, but you can’t ship something like that. If you did, you would be managing the model lifecycle by hand forever which isn’t sustainable. Beyond an initial local experiment in a notebook, you need to be thinking about pipelines and automation from the beginning. Run the training pipeline whenever there is new data – this is called Continuous Training (CT). The automated process can do supervised training and update the model weights for you.
The industry throws the term model weights around a lot and I think it deserves more of an explanation. The concept of model weights actually comes from the theory of Hebbian plasticity in the brain (a form of synaptic plasticity that basically says “neurons that fire together, wire together”). The way the theory goes, is that if two neurons in the brain are working together often enough they will strengthen the connection between themselves for higher bandwidth transmission (just like we do on our roads when there is too much traffic at rush hour). This strengthened bond ends up re-arranging our neurons, and it’s how we learn! 🧠
Machine learning models do the same thing. During supervised learning, they keep track of how often the model makes a bad prediction. This is measured against the dataset. If I ask the model what Jane’s favorite color is, it better get the right answer if that answer is explicitly spelled out in the dataset it was trained on. If it gets the answer wrong, the weight is reduced. When it gets the answer right, it increases the weight, just like our brains strengthen neural connections. This is where the term neural network comes from.
I’ll do a post soon that goes into detail on getting that magical ML retraining pipeline into place, but for now just remember that automation is key (always be training) and that models have to learn and stay up to date, just like you do.
Last modified: March 11, 2024