This is part eight of a series on launching a data science project.
At this point in the data science process, we’ve launched a product into production. Now it’s time to kick back and hibernate for two months, right? Yeah, about that…
Just because you’ve got your project in production doesn’t mean you’re done. First of all, it’s important to keep checking the efficacy of your models. Shift happens, where a model might have been good at one point in time but becomes progressively worse as circumstances change. Some models are fairly stable, where they can last for years without significant modification; others have unstable underlying trends, to the point that you might need to retrain such a model continuously. You might also find out that your training and testing data was not truly indicative of real-world data, especially that the real world is a lot messier than what you trained against.
The best way to guard against unbeknownst model shift is to take new production data and retrain the model. This works best if you can keep track of your model’s predictions versus actual outcomes; that way, you can tell the actual efficacy of the model, figuring out how frequently and by how much your model was wrong.
Depending upon your choice of algorithm, you might be able to update the existing model with this new information in real time. Models like neural networks and online passive-aggressive algorithms allow for continuous training, and when you’ve created a process which automatically feeds learned data back into your continuously-training model, you now have true machine learning. Other algorithms, however, require you to retrain from scratch. That’s not a show-stopper by any means, particularly if your underlying trends are fairly stable.
Regardless of model selection, efficacy, and whether you get to call what you’ve done machine learning, you will want to confer with your stakeholders and ensure that your model actually fits their needs; as I mentioned before, you can have the world’s best regression, but if the people with the sacks of cash want a recommendation engine, you’re not getting the goods. But that doesn’t mean you should try to solve all the problems all at once; instead, you want to start with a Minimum Viable Product (MVP) and gauge interest. You’ve developed a model which solves the single most pressing need, and from there, you can make incremental improvements. This could include relaxing some of the assumptions you made during initial model development, making more accurate predictions, improving the speed of your service, adding new functionality, or even using this as an intermediate engine to derive some other result.
Using our data platform survey results, assuming the key business personnel were fine with the core idea, some of the specific things we could do to improve our product would be:
- Make the model more accurate. Our MAE was about $19-20K, and reducing that error makes our model more useful for others. One way to do this would be to survey more people. What we have is a nice starting point, but there are too many gaps to go much deeper than a national level.
- Introduce intra-regional cost of living. We all know that $100K in Manhattan, NY and $100K in Manhattan, KS are quite different. We would want to take into account cost of living, assuming we have enough data points to do this.
- Use this as part of a product helping employers find the market rate for a new data professional, where we’d ask questions about the job location, relative skill levels, etc. and gin up a reasonable market estimate.
There are plenty of other things we could do over time to add value to our model, but I think that’s a nice stopping point.
What’s Old Is New Again
Once we get to this phase, the iterative nature of this process becomes clear.
On the micro level, we bounce around within and between steps in the process. On the macro level, we iterate through this process over and over again as we develop and refine our models. There’s a definite end game (mostly when the sacks of cash empty), but how long that takes and how many times you cycle through the process will depend upon how accurate and how useful your models are.
In wrapping up this series, if you want to learn more, check out my Links and Further Information on the topic.