So you’ve taken the time to pore over some data and put together a machine learning model that works pretty well. The model performs pretty well initially, but over time, the model’s error rate begins to increase. What’s going on?
Turns out that life isn’t static. It changes. And in the world of online retail, it changes quickly and constantly. This is why frequent changes to predictive models are necessary.
To answer how frequent those changes should be, ask yourself two questions:
How frequently does my data change?
In the online retail world, data change is a constant. Models should be re-evaluated as their accuracy begins to fall off or as more diverse data becomes available to train a model. In practice, this may mean that the basis of the model needs to change. Is the model best represented by a simple regression or a more complex decision tree or random forest? Model accuracy and performance have to be considered all while data is changing under your feet.
How accurate do I want my machine learning models to be?
The answer, of course, is that models should be as accurate as possible. To be sure, there are dangers such as overfitting which is why employing multiple techniques such as cross-validation guard errant models. A related consideration is model performance and how quickly a prediction can be made. In the world of online retail, anything non-performant will leave a shopper waiting. And that’s never good.
Take the example of classifying shoppers as being one of your most important or not (while the scenario is simplified, the concepts are the germane). We’ll call the designation “VIP” and build a model that examines user interaction with the site to determine whether they fall into this classification.
So the model is built and performs well but you notice that over time, more and more shoppers are being given this classification. What’s wrong? It could be that predictions of which classification a shopper belongs to are being made with data that is no longer representative of site behavior. VIP customers of the past may have had to meet a lower bar of activity and when site traffic and sales took off, the machine learning model was made irrelevant. If you’re using an outdated predictive model to offer incentives to customers, you could be incentivizing the wrong behavior.
Additionally, are you comparing different clustering algorithms? Do they perform differently? Does a K-Means Model work better or is a Gaussian Mixture Model more effective? Do ensemble models provide better accuracy?
Data Science, Simplified
All of this is fairly complex and isn’t usually in the realm of what most online retailers consider their sweet spot. Leveraging a solution like Granulytic allows you to focus on your business while we focus on your data and apply the right machine learning models to get the best results.
Want to see it in action? Schedule a demo today!