BlogBlog Details page

Research Resources

The Battle Between Machine Learning vs. Statistics Over Consumer Insights

Naira Musallam, PhD • 13 Sep 2017

With consumers providing so many data points through any number of information gathering techniques, it is imperative that companies take a strategic approach to analysis, especially that demographics no longer suffice.

Furthermore, effective consumer research should get at the “why” behind consumer behaviors and preferences to survive a competitive environment and lead the future.

All of this begs the question, how? Researchers have been often debating the effectiveness of two techniques: machine learning versus classic statistics. The relationship between them has not been without its hardships, with each one of them making the case that it is the proper strategy for maximizing your ROI from the data collected from consumers.

Over a series of blog posts, we will help dispel some myths about a lot of buzzwords in the field. The first topic we’re tackling is, Machine Learning vs. Statistics. What is machine learning? What is classical statistics? Are they different? If yes, How? When do I use them? And which one is more effective to help me understand my consumers?

Machine Learning vs. Statistics

First things first, let us cover some working definitions for both. Machine learning and statistics are fields that employ various analysis techniques for the purpose of understanding data. Machine learning is a type of artificial intelligence (A.I.) that allows software applications to learn and predict outcomes without being explicitly programmed. You would mainly use machine learning to generate a prediction about your whole customer base from existing datasets.

Statistics on the other hand is defined as a branch of mathematics dealing with the collection, classification, analysis, and interpretation of data. It is powerful for drawing inferences about your customers from a sample of a larger population. While Machine Learning is concerned with identifying patterns based on existing datasets, the primary goal with classic statistics is to focus on both describing the data by reducing it to its most meaningful level and to infer about the larger population from only a portion of your customers.

Because of these reasons, they tend to focus on solving slightly different business needs. Machine learning rules when there is a need for an individualized prediction about a certain consumer behavior or trend. Statistics wins the day when there is a need to understand a big strategic question such as “why”, “how”, and for “who”. For example, machine learning is deployed when you’re interested in generating a list of recommended items for consumers based on past behavior. Statistics is optimal when you want to test a hypothesis around why consumers are buying specific products, or why behaviors are trending a certain way.

What makes a certain technique more effective than the other? The answer is it depends on what you are hoping to achieve. While a deep academic analysis is beyond this blog, here are three key differentiators.

Assumptions, Assumptions, Assumptions

The bell curve. We all saw it by day 3 of Statistics 101 class. It takes many back to that unpleasant time of your introductory class in statistics where the lecturer talked about things that we’ve just as soon forgotten. Do you remember what a t-test is and the meaning of a p-value or what significant testing is? At the heart of it all is the ability to infer something about the population from only a sample. So we make assumptions about things such as the independence of observations and the distribution of the population.

For example, in our case as it may apply to the group of customers who responded to last month’s satisfaction survey or the brand health tracker from last quarter. The soundness of those assumptions and the representation of this sample as it pertains to the larger population will greatly affect the extent to which your prediction models about the larger consumer base are actually accurate.

On the other hand, when you apply machine learning to your analysis it is free from any of those assumptions. The focus is on the existing dataset at hand, such as recent purchase behavior or brand perceptions, and the patterns it can reveal. No assumptions are made because machine learning users are not interested in inferring something about the population from the sample. The population of interest is actually the sample. The idea is the more data you have, the more patterns will be revealed. Over time, with more data the predictive models will improve.

Data Quantity vs. Data Quality

The second big differentiator between machine learning and statistics is the importance of sampling techniques. Statistics is concerned with inferring something about all of your customers based off of data from a survey of only a sample of the entire customer base. This is why you may hear statisticians discussing how important proper sampling is to the final outcome (e.g. see literally anything about political polling).

Machine learning assumes that the samples are independent and identically distributed from the population and that they are already representative of that entire population. The result is that machine learning techniques end up being way more pragmatic and cheaper to conduct on scale.

Keep in mind, however, that what you gain in scalability you may lose in accuracy. Google’s epic failure to predict the number of flu cases based on Google search terms in 2013 is a classic example. While the underlying machine learning algorithms were relatively sound, ignoring variables such as uncertainties and sampling techniques lead to spectacularly inaccurate estimates over time.

Exploring vs. Confirming: Different Ways of Learning

Data analysis techniques are classified as either exploratory or confirmatory. As the labels imply, exploratory analysis seeks to identify interesting or useful patterns, whereas confirmatory analysis tests specific hypotheses in the dataset that can either be confirmed or refuted.

You’re either looking for new trends in consumer data that you aren’t aware of or checking to see if customers are engaging with your products the way that you intended.

Machine learning algorithms are mainly exploratory and attempt to generalize decision making. Again, due to the fact that machine learning folks are less concerned with hypothesis testing.

Statisticians focus primarily on hypothesis testing. Asking questions like, are females more likely to purchase organic food than men? Are millennials more conscious about environmentally friendly products than other generations?

Both have their place in solving business challenges, depending on the context. Companies need to take a step back to evaluate which method is the best for that particular problem before getting caught up in the buzzwords of the moment. Or feel free to just reach out to us!

So What?

Given the choice between machine learning and classic statistics, which should be used? Of course, the answer is it depends. It is becoming clear that both fields can benefit from each other and both fields can assist in better understanding consumers.

The team at SightX has extensive experience in data analytics and have helped companies of all sizes make data-driven, consumer focused decisions. We have a general excitement about the potential for big, meaningful impact that we can have in the world of consumer research.

We admit, “machine learning” has a sexy ring to it, but trendy buzzwords do not a smart business decision make. Blindly following trends won’t benefit anyone. Big data doesn’t mean smart data. We want to contribute intelligent tools to the consumer research space to help free time for thinking within companies.

Meet the author

Naira Musallam, PhD

Naira the co-founder of SightX and our in-house expert for all things research, statistics, and psychology. She received her doctorate from Columbia University, and served as faculty at both Columbia and NYU. She has over 15 years of experience in data analysis and research across multiple sectors in various industries.

Ready to meet the future of market research?

Reach out to get started

Request Demo

RESEARCH RESOURCES

How AI is Making it Harder to Forget about Customers in Go-to-Market Motions (Part 1)

In our new series, we’re exploring the ways AI is making it harder than ever to lose touch with consumers in go-to-market and product launch strategies. Specifically, we will look at the excuses often used w...

Dr. Brad Smith • 23 Apr 2024

RESEARCH RESOURCES

Data Privacy & Generative AI in Market Research

In the digital age, data privacy and security are paramount, especially when utilizing powerful generative AI tools, like our Lindsay • 05 Apr 2024

RESEARCH RESOURCES

How to Conduct Consumer Insights Research

Understanding consumer behavior is essential for companies aiming to stay ahead of the curve. Consumer insights research

Savannah Trotter • 03 Apr 2024

RESEARCH RESOURCES

Unleashing the Power of Survey Pages with Randomization and Looping

When designing a The SightX Research Team • 27 Mar 2024

RESEARCH RESOURCES

How to Conduct Pricing Research to Maximize Revenue

Your pricing is nothing to take lightly. Like it or not, the number you choose sends signals about your product quality and brand's status to potential customers.

Savannah Trotter • 14 Feb 2024

NEWS

Generative AI is Here to Push the Limits of Market Research

While the technology of generative AI has been around for quite some time, it wasn’t until the introduction of Lindsay • 26 Oct 2023

RESEARCH RESOURCES
Ready, Steady, Grow
SightX co-founder, Naira MusallamTim Lawton • 13 May 2020

RESEARCH RESOURCES
Beyond Buzzwords: What is Natural Language Processing?
Out of the Weeds, Part II

In the first article of our Beyond Buzzwords series, we set out to Naira Musallam, PhD • 12 Mar 2019

Meet the author

Naira Musallam, PhD

Ready to meet the future of market research?

Reach out to get started

Request Demo

Ready to meet the next generation of market research technology?

The Future of Market Research

Request Demo

The Battle Between Machine Learning vs. Statistics Over Consumer Insights

Machine Learning vs. Statistics

Assumptions, Assumptions, Assumptions

Data Quantity vs. Data Quality

Exploring vs. Confirming: Different Ways of Learning

So What?

Meet the author

Naira Musallam, PhD

Ready to meet the future of market research?

SightX, the leader in the future of automated market research, has announced the release of significant additional automated capabilities....

Out of the Weeds, Part III

New brand reflects the Company’s mission to enable agile data driven decision making, by creating an end to end platform t...

Out of the Weeds, Part II

Maloney McCall’s prior start-up success provides perfect template for SightX expansion

Out of the Weeds, Part I

Have you ever considered that your customers may be more diverse than your marketing?

One big lesson and what it means for our future.

Meet the author

Naira Musallam, PhD

Ready to meet the future of market research?

Ready to meet the next generation of market research technology?