In this series of articles, our goal was to demystify the meaning of machine learning, some of the common buzzwords being used, and show how it was relevant and practical to consumer insights.
This third installment is all about understanding decision trees, how they are used, and the implications for consumer research.
Decision trees, a predictive modeling approach in machine learning, use observations about a certain item to help make conclusions about the item’s target value.
Just in case you missed the first installment in this series, machine learning is a branch of artificial intelligence that automates analytical model building, where systems can learn from data and identify patterns.
You don’t need to understand how to build the models yourself to be able to utilize the power of these and other ML techniques. (hint: call us)
So what are decision trees really?
Decision trees are a non-parametric supervised learning method used for both classification and regression (prediction) tasks. Non-parametric simply means that fewer assumptions are made about the population, or rather the data is not required to fit a normal distribution.
That is not meant to imply that such models completely lack parameters, but that the number and type of the parameters are flexible and not fixed in advance. Non-parametric data is also often ordinal in nature.
For example, a survey of consumers asking their preferences on a range from Dislike to Like, would be considered ordinal data. Or any another type of Likert scale.
Supervised learning is the machine learning task of inferring an output given an existing labeled data set. This is versus un-supervised learning, which seeks to uncover the hidden structure/pattern within an unlabeled data set.
The primary goal of such a decision tree algorithm is to build a model that classifies and then predicts the value of a variable or outcome of your interest by learning a series of simple rules inferred from the structure of the data. The most common “rule” is in the format of an “if/then” statement.
Decision tree algorithms are considered to be a class of powerful models for their ability to achieve a high accuracy, and being both clear and interpretable (e.g. "we believe with a high degree of certainty that our customers will behave in this way.")
Decision trees play into our decisions as consumers at all points during the day. With some effective research, it’s possible to get a better understanding of where and how consumers navigate those choices
The tree can be as simple or as complex as the situation requires. All decision trees enable users to develop a classification system that is able to predict an outcome of a certain interest or topic. For example, how likely is a certain segment of consumer to make a purchase?
How does it work in practice?
There are several methods used to build the actual classification system. All of them more or less accomplish the same thing: they classify and then predict.
The choice of a particular algorithm is largely dependent on whether you are attempting to predict a continuous variable (e.g. rating scale) or a categorical variable (e.g. gender, specific income level, etc.). Then of course the level of complexity of the actual variable itself. A binary example, Yes/No, is less complex that a three level categorical variable, Yes, No, Maybe.
Another way to describe a machine learning decision tree is as a Classification and Regression (C&R) Tree. Same as before, the C&R Tree algorithm generates a decision tree that allows you to predict or classify future observations.
This method uses a recursive partitioning to split the records into segments of either predicting the values of a continuous variable (regression) or predicting the values of a categorical dependent variable from one or more continuous and/or categorical predictor variables.
A C&R tree node is considered “pure” if all cases in the node fall into a specific category. The C&R Tree node input fields can be numeric or categorical, while all of the splits are binary.
For example, we may be interested in predicting who will or will not be a repeat purchaser or renew their subscription.
Another, similar type of tree building algorithm is the CHAID node method, which uses Chi Square statistics to identify ultimate splits, which allows for the splits to expand beyond two branches. Perhaps a topic to dive deeper into later!
All of this begs the question, how and when do I use it?
The use cases for using a decision tree based algorithm in the world of consumer insights are numerous and probably used more than you may have thought.
Among the more common applications are:
Segmentation: Identify consumers who are likely to be influenced
Stratification: Assign consumer segments into various categories (e.g. low, medium, high levels of loyalty)
Prediction: Create rules to predict a related outcome (e.g. likelihood of purchase versus no purchase)
Consumer Journey mapping: Classifications and predictions to map out a specific consumer journey