Customer Insights

An introduction to segmentation and cluster analysis

Ray Poynter
November 29, 2021
·
minute
read
Want to see more?
Register here.

Segmentation is a process where people (e.g. customers, users, voters etc.) are divided into groups, to enable their needs/wants to be more accurately understood and met. For example, people come in all sorts of shapes, so clothes are sold in a small range of sizes (for example Small, Medium, Large, and Extra Large). This sort of segmentation is an alternative to either a) producing just one size and expecting everybody to wear it, or b) tailoring every garment to the wearer (with all the associated costs and delays that would entail). By segmenting the buyers of clothes into groups, manufacturers create a market where most people are able to buy clothes that (more or less) fit them.

The fundamental idea of segmentation is to group people who are similar in some way, and who are different from other people. In food retailing or manufacturing for example, we might group Vegans, Vegetarians and Others into three groups, to better understand each group and to better meet their needs.

Why segment people?

Organisations tend to segment people because:

A. The different groups want different things

B. The different groups respond to different media/messages

C. Or, both A and B

Three key rules for segments

When we create segments, we need to ensure they are:

1. Identifiable - we need to be able to allocate our customers/viewers/users/members etc. into the correct segments, so, for example, we can target media to reach the right group.

2. Viable - we need the group to be big enough to warrant us treating them as a separate group.

3. Distinctive - we need the segments to have different behaviours, or we need them to potentially have different behaviours if we were to supply a new product or service.

Not all Segmentations use Cluster Analysis

This post is mostly going to talk about using cluster analysis to create segments, but it is important to remember that not all segmentation projects use cluster analysis. You might decide to segment your customers as Regular, Occasional, or Rare. You might divide them into buying online, buying in-store, and hybrid. These are segmentations, but they do not use cluster analysis.

If there is a single variable that is the key characteristic of a market, then creating segments from the data you have is often the best option. When you need to group people on multiple characteristics, especially if you need to group them in terms of attitudes, beliefs, and preferences, then using cluster analysis is often the best solution.

What is Cluster Analysis?

Cluster analysis is a statistical process that uses data from the target group in combination with an algorithm to suggest segments (clusters). There is no perfect way of running cluster analysis, but there are some key things to be aware of:

• The need to use variables that differentiate between people

• The need to use segmenting variables that are answered by everybody

• The need to use variables that relate to the outcomes of interest

• The difference between segmenting and describing variables

Variables that differentiate

To create clusters, we need to ask questions that elicit different answers from different sorts of people. If we ask if the food in the restaurant should be safe, almost everybody will say yes, if we ask if the food should be tasty, almost everybody will say yes. These two questions are not helpful. If we ask whether the food should be spicy, we will get different answers from different people, so this question might be useful.

Variables that are answered by everybody

If we have some questions that some people don't answer, for example in a clothing study we might not ask men to answer questions about bras, then we can’t use those questions in the cluster analysis. If we do use them, we cluster people not on their answers, but on the criteria we used to ask the questions (in this example we would cluster people into male and female).

In the study, we can ask questions that are only asked to some groups of participants, but we can’t use them as segmenting variables (we would use them as describing variables).

Variables need to relate to the outcomes of interest

If we want to group people in terms of their relationship with personal finance, we will want to ask questions to do with things like risk, ethics, level of desired activity etc. But we don’t want to segment them on variables that do not relate to the outcome. For example, if we knew everybody’s blood group, we would not use it as a segmenting variable, as there is no reason to believe that people’s blood group impacts their financial decisions.

The difference between segmenting variables and describing variables.

Segmenting variables are relevant to the outcome, create differences in the responses and are answered by everybody. Describing variables are used to tell us more about the clusters we find.

In most cases demographics (e.g. age and gender) are used as describing variables, not segmenting variables. If we conduct a study of car drivers and find that there is an ‘amateur engineer’ group that loves taking the engine out of the car, that is interesting. If we find that the group is 90% male, that is a finding. But, if we had included gender in the cluster variables, it would not be a finding, because it could simply be a result of the clustering algorithm trying to put the men in one group and the women in another.

Note, the describing variables do not need to be asked of everybody, which allows these variables to dig deeper. Consider adding some open-ended questions as describing variables, these can help bring your clusters to life.

Personas versus Segments

There is a lot of overlap between what some people mean by personas and what some people mean by segments. However, the use of personas tends to be more focused on thinking about how to tailor services to different groups of people. Segmentation tends to be focused on estimating the volume (or commercial value) of different groups of people. In many cases the best approach is start by doing a segmentation and, once the segments are determined, switch to a more personas mindset.

Running the Algorithm

There are lots of algorithms, none of them perfect, all of them iterative. No algorithm can prove it has found the best solution. The key determinant, therefore, is ‘has it found a useful solution?’ The decisions about the best treatment of the data (for example whether to run a factor analysis before running the clustering algorithm), the number of clusters found, and the choice of algorithm, are made in the context of what the end client is looking for and in terms of what will be useful.

If you’re wanting to discover more about Segmentation, contact us today!

Download now

Download
Download