Find Segments in your Data – A guide on segmentation analysis techniques

The purpose of this blog is to provide an understanding on how you can mathematically analyze your survey results. If you are interested in software that will analyze this for you, read here on how StatGenius segments your respondent data.

As you already know, segmentation analysis is one of the most common (and most useful) methods of analyzing survey data. What you might not know, is that survey researchers have three methods of analyzing survey and respondent data.

The most commonly used method for researchers is Descriptive, but you should start becoming familiar with cluster analysis as well. Cluster analysis is the traditional statistical method of segmentation in market research, which is actually a family of methods that includes several variations. The two that we discuss, are hierarchical and non-hierarchical (or partitioning) methods.

Descriptive Segmentation: uses crosstabs and tables to describe response findings along pre-defined segments

Hierarchical Clustering: involves creating clusters in a predefined order from top to bottom

Non-hierarchical Clustering: involves formation of new clusters by merging or splitting the clusters instead of following a hierarchical order

Read how a restaurant chain enhanced their own segmentation

Cross Tabulation Segment Analysis (Descriptive Analysis)

The most basic (and because of that, the most common) way to segment your data is using basic cross tabulation analysis. Respondents are divided into predetermined groups, e.g. age or income groups, and their differences are studied across a variety of questions. Because this is based on predetermined segment groups, this analysis approach is often referred to as a priori segmentation.

Advantages of Descriptive Segmentation Clustering:

Descriptive is the easiest method of analyzing your segments, and best of all, no statistician or advanced software is needed to complete your research. This saves you time and money, and the clarity provided by the crosstabs helps to deliver clean data that is easily understandable and defendable. Results are easy to read and explain, and useful in any type of presentation.

Disadvantages of Descriptive Segmentation Clustering:

The issue with descriptive segmentation analysis using crosstabs is that the approach requires the use of predetermined segment groups. This technique should not be used in cases when you are seeking to validate or learn about new segments. Many times, we have seen researchers attempt to shortcut the segmentation process using crosstabs, when they should have used a more advanced segmentation method – do not fall for this! Cross tabulation should only be used in the most basic segmentation cases

We will now discuss the two most common advanced methods – both of which can be used when the researcher does not know how their respondents segment themselves.

Hierarchical clustering

Hierarchical cluster analysis is a statistical segmentation technique that groups similar objects into groups called “clusters”. These clusters are statistically distinct from each other, but the data points (your respondents) within each cluster are similar enough to each other, according to the filter criteria that the method creates.

The basic idea here is that the researcher starts with each observation as one cluster (see above). The individual observations are located on an n-dimensional space (where n is the number of attributes that are used). The distances between observations (i.e. responses) are measured, and the data points closest to one another are joined together to form a new cluster. The process continues until all observations have been merged into a single cluster, and the optimal number of clusters is determined by using standard measures of fit.

Advantages of Hierarchical Clustering:

Unlike a priori segmentation (used in crosstabs as described above), this method will help you (the researcher) discover new ways to segment your target audience, or validate your existing assumptions about segments. The method is flexible and dynamic with how your survey results guides the analysis.

Hierarchical clustering is also easy to understand and implement. If you see the diagram above, clusters are easily recognizable as individual “branches”, and the insights gained from this are easily consumed by business stakeholders.

Disadvantages of Hierarchical Clustering:

The real disadvantage of using this method is that it requires the use of statistical software, such as SSPS. This increases the cost to run this analysis, and the time required to analyze a data set can become extremely cumbersome depending on the complexity of the number of branches / variables that a researcher wants to analyze.

Researchers must be certain only to use data with a low level of error – as this can significantly decrease the reliability of the results. For data with more errors in it, non-hierarchical clustering is a better tool.

Non-Hierarchical Clustering (k-Means Clustering)

The most common method of non-hierarchical clustering segmentation is called “k-means clustering”. The technique selects random observations that are chosen as “seeds”, and then it finds observations that are closest to each seed – thus defining the individual clusters. There are a lot of statistics that are used for understanding the optimum number of seeds, the distance from each segment, etc. but these are typically taken care of using the aid of a statistical software package such as SSPS.

Advantages of Non-Hierarchical Clustering:

Similar to Hierarchical Clustering, this technique helps you (the researcher) discover new ways to segment your target audience, or validate your existing assumptions about segments. The method is flexible and dynamic with how your survey results guide the analysis.

Non-hierarchical clustering is considered to be more reliable and faster than hierarchical, because it avoids the ‘arbitrary’ hierarchies that may result.

Disadvantages of Non-Hierarchical Clustering:

The greatest disadvantage of this segmentation method is that a statistician is almost always needed – to enter the control parameters into a statistical package, and then to interpret the results. However, new AI-driven software packages such as StatGenius automates Non-Hierarchical Clustering in a way that any manager or professional can complete their research – without the mathematics help.

Learn more about Cluster Analysis

Choosing the Right Segmentation Method When it comes time for you (the researcher) to choose the best segmentation method, there are multiple considerations to think about. While the three most common methods of segmentation analysis are discussed above, it is never clear that one approach is the best in every situation. The final solution depends on the number and nature of variables included in the data, and your final analysis. Of course, the problem is that it is hard to identify the variables that can be useful in the analysis. This is typically why, in the past, a statistician is relied on. However, new AI-drive software packages such as StatGenius guide not-statisticians on how to complete your segmentation project on your own, with a complete self-service solution.

Read More: Navigating and Collecting Data for Segmentation