The ART of (genetic) segmentation
I saw a post or two recently on genetic algorithms (including Alberto's and Walter's) and it made me want to post about a recent Fair Isaac use of genetic algorithms called "Adaptive Random Trees" or ART.
So how does ART work, how does it apply something as esoteric as genetic algorithms to something as practical as making cross-sell offers? Let's lay out a typical scenario. I have a population of customers and I want to predict something about them - say their likelihood of accepting a specific cross-sell offer. A typical predictive analytic approach is to develop a model that uses historical data about who has, and who has not, accepted offers to calculate some kind of score or measure of responsiveness - an equation if you will - that can be applied to each customer to calculate a value that represents their likely responsiveness. Often when developing these models an analyst discovers that different segments of the customer base behave differently. This means that more than one model is required - typically each segment will need its own model. For a given population this may quickly result in an unwieldy number of segments and hence models. Even though building a predictive model for each segment improve the accuracy of the prediction, and thus the likely precision of the targeting of that population, it costs too much and takes too long to do. Enter ART.
ART uses genetic algorithms to develop both the segments that should have their own models and the models themselves. It does this by:
- Taking an initial model
- Generating random decision trees that segment the population and building a simple model for each segment
- Comparing each segmented model system (treeplus models) in terms of their ability to make accurate predictions as measured by divergence. Good trees typically have models for each segment that incorporate particular characteristics that are meaningful for predicting behavior within the segment
- Then it starts to iterate
- Take the best trees and keep them (so you never lose the best results found so far)
- Select trees to be the "parents" of the next generation, favoring better trees in a Darwinian fashion
- Take part of each parent tree and combine to make a new generation of trees
- Mutate these new trees by adding in a certain amount of random variation
- Re-assess the trees or segmented model systems
The process continues until the new trees are not producing any better results than the previous generation. At this point you have both an optimal segmentation tree and initial models to calculate a prediction for the members of each segment. All of this without human intervention in the process - the process is initiated and bounded by someone with strong analytic skills and the final result is easy to modify and streamline to match production needs but the genetic algorithm does the work in between. The end result is best expressed as a set of business rules representing a decision tree and a set of predictive models for each segment. A true EDM solution.