Hierarchical & Cluster Analysis

Cluster analysis is a family of statistical tools aimed to group data observations in relatively homogenous groups that differ from each other. This procedure is based on a set of individual characteristics for each object/consumer, which will be allocated into groups. Depending on the purpose of the analysis, different parameters of cluster solution are taken into account. Therefore, there is no single standard approach for cluster analysis; and it does not have strict mathematical criteria of solution quality. The decision on optimal number of groups is made by researcher and is based on logic of results interpretation and their further applicability.

In cluster analysis, there is no dependent variable. It is created in the process of analysis and contains information on allocation of each case/sample to certain group. This is the key output produced by cluster analysis.

Hierarchical cluster is a specific approach within cluster analysis that can be used when data knowledge is limited. This method reveals different levels of proximity between observations and thus consequently unites smaller clusters in bigger ones or splits larger clusters in smaller. The specific approach is selected on the basis of objective of the exercise. If there is a need in extracting several large groups, it’s better to go top-down (divisive approach); if a number of small clusters required, bottom-up (agglomerative) approach would be more effective.

Unlike K-means clustering, the researcher does not have to specify the number of groups to be created prior to the procedure; as hierarchical analysis provides the structure of the data and number of clusters can be selected afterwards, when the structure is analyzed with the help of several statistical indicators.

As in case of other cluster techniques, continuous variables serve as the best input for producing cluster solution; categorical variables can also serve as predictors (but are less desirable as classification is based on the distances between data points).

Real World Examples:

Hierarchical cluster analysis can be applied in business on consumer and object-based data (as other grouping techniques). The difference from other methods is that this approach is the best option when little or nothing is know on the nature of data or groups.

Several examples below show possible application of cluster analysis (use of different techniques possible) to solve real problems:

Developing positioning for new brands requires good understanding of consumer needs and values. This knowledge guides the development of a brand that would resonate with the audience and generate consumer appeal and further purchase. As one of the steps on this way, companies often conduct consumer segmentation. The results of this study

allow to understand the size and therefore the potential of various consumer groups, their specific features in terms of values, socio-demographic characteristics, purchasing behavior, category needs, brand preferences etc. Input variables can include intensity of specific behavior (e.g. reading books, spending time with friends etc.); adherence to specific values and beliefs etc.

We will consider am example with input variables that are measured with scale. For example, we have collected data for 5 respondents on 2 parameters measured with 9 pt scale: 1) frequency of reading books (1pt – don’t read books; 9pt – read books every day); 2) frequency of spending time with friends (1pt – don’t spend time with friends; 9pt – spend all my free time with friends). We have got the following paired responses:

respondent #frequency of reading booksspending time with friends

We can clearly identify 2 groups/patterns of behavior: group 1 – respondents spending time with friends and spending little time reading the books; group 2 – respondents mostly reading the books and spending very little time with their friends. If we call hierarchical cluster analysis – we will see the structure of data based on respondents’ distance from each other using the input variable. Graphically, this structure is represented with a dendrogram (it’s a tree showing different levels of groupings of respondents / data points, starting from each case and ending with single group). The graph below shows that respondents 2 and 3 are closer to each other then respondent 1 and 2 or 3; however, they form a single group on the next step. Respondents 4 and 5 are united into single group immediately. The lower are grouped the cases on Y axis (height), the more homogeneous are the groups. It’s up to researcher to decide the cut-off level to pick the appropriate number of clusters.

Detecting competitive environment for products based on organoleptic data. For example, we are a big company producing yoghurts. We conduct sensory profiling for set of fruit yoghurts. Attributes collected include milkiness of taste, sweetness, intensity of fruit taste, notes of caramel flavor etc. Conducting cluster analysis would allow us to understand which yoghurts have the closest taste profile and therefore compete directly, and which ones are distanced in terms of taste.

Imagine we want to groups 33 cars based on 2 characteristics: horsepower (hp) and qsec (accelaration)

Below is an extract of a spreadsheet file containing data related to this cas

Mazda RX4-0.54-0.78
Mazda RX4 Wag-0.54-0.46
Datsun 710-0.780.43
Hornet 4 Drive-0.540.89
Hornet Sportabout0.41-0.46
Duster 3601.43-1.12
Merc 240D-1.241.20
Merc 230-0.752.83
Merc 280-0.350.25

     e: columns should store variables; the first row contains variable names.

Once we’ve run the analysis, apart from the dendrogram (which we analyzed in first example), there are several outputs to analyze and interpret. Let’s have a look at them:

Agglomerative (or divisive) coefficient: this coefficient measures the strength of clustering structure and has a value range of [0;1]. The closer the value is to 1 – the better is the structure of the data.

> hc$dc

[1] 0.9248927

In our case coefficient value is close to 1, so we can consider clustering structure as good.

Now we’re coming to criteria to select the optimal number of clusters:

Elbow method: it takes into account the variance explained for the data on each number of clusters. When adding another cluster does not increase the interpretational power of cluster solution, it’s a point to stop. In our case, after 3 clusters, the benefit from adding the 4th cluster is very moderate. Graphically, the graph ‘bends’ at the point of 3 clusters, so it’s an optimal number.

Silhouette method – this method evaluates how good is a match of the data within the same cluster, and how well it is distanced from the closest different cluster. The score close to 1 indicates that the data is allocated to an appropriate cluster, while the score close to 0 indicates, that the data is allocated in a wrong cluster and re-classification is needed. In our case, this method supports the conclusion we reached at previous step – 3 clusters provide the highest score.

Number of observations in clusters: we need to check that the groups are more or less balanced in terms of size. The output below shows that cluster #3 contains only 1 car, so we need to check that this is a reasonable division at next step.


1 2 3

16 15 1

Cluster means: once we’ve decided on the number of clusters, it’s important to check that cluster solution makes sense from interpretative point of view, so we need to have a look at basic description of clusters based on classifiers used in the procedure. The more the groups differ from each other – the better.

Group 1Group 2Group 3

In our case:

Cluster #1 stands for low horsepower and acceleration above average.

Cluster #2 is characterized by high horsepower score and has the lowest acceleration level among all groups.

Cluster #3 goes for low horsepower and high acceleration.