Customer Segmentation Using Machine Learning

Customer Segmentation Using Machine Learning

Customer segmentation is the process of dividing a customer base into groups of individuals that are similar in certain ways relevant to marketing, such as age, gender, interests, and spending habits. It enables companies to target specific groups with tailored promotions, products, or services that are most likely to resonate with them. Machine learning has become a popular tool for automating the process of customer segmentation, providing a more efficient and effective way to identify patterns and relationships within customer data.

There are several different methods for using machine learning to perform customer segmentation, including:-

  • Clustering algorithms: These algorithms divide customers into groups based on their characteristics and behaviour. For example, k-means Clustering can be used to find the k number of clusters in a dataset.
  • Decision trees: These algorithms use a tree-like model to identify the most important variables that influence customer behaviour. By using decision trees, companies can determine which customers are most likely to respond to certain marketing campaigns or products.
  • Neural networks: These algorithms can be used to model complex relationships between customers and their behaviour. Neural networks can identify patterns in customer data that are not easily recognizable through traditional methods.
  • Association rule learning: This method finds the relationships between customer attributes and behaviours, such as buying habits and product preferences. Association rule learning can help companies understand which products are frequently purchased together and target customers accordingly.

Advantages of Machine Learning for Customer Segmentation

  • One of the key benefits of using machine learning for customer segmentation is its ability to process vast amounts of data in real time. This allows companies to quickly identify new trends and patterns in customer behaviour, allowing them to make more informed marketing decisions. Additionally, machine learning algorithms can continuously learn and improve over time, providing a more accurate picture of customer behaviour.
  • Another advantage of using machine learning for customer segmentation is that it eliminates the need for manual data analysis. This can be a time-consuming and error-prone process, particularly when working with large datasets. Machine learning algorithms can automate the process of data analysis, providing companies with more accurate and reliable results.

We are now going to execute an unsupervised data clustering on the customer records from a grocery store's database.

Importing Libraries

Loading Data

Output:

Customer Segmentation Using Machine Learning

Data Cleaning

In this, we will do the following task:

  • Data Cleaning
  • Feature Engineering

To gain a complete understanding of the procedures, we are going to clean the dataset. Let's examine the information included in the data.

Output:

Customer Segmentation Using Machine Learning

We may deduce and note the following from the output described above:

  • Income is missing several values.(As only 2194 values are non-null)
  • Dt Customer, which represents a customer's date of database entry, is not processed as a DateTime.
  • In our data frame, there are several category features (as well as some dtype: object features). Therefore, later on, we will need to encode them into numeric representations.

We will start by removing the rows with the missing income values for the missing values.

Output:

Customer Segmentation Using Machine Learning

The next stage is to build a feature out of "Dt Customer" that shows how long a customer has been a registered user of the company's database. But to keep things straightforward, we are using this value in relation to the most recent client in the record.

Therefore, we must compare the most recent and earliest recorded dates in order to obtain the values.

Output:

Customer Segmentation Using Machine Learning

Output:

Customer Segmentation Using Machine Learning

Making a feature ("Customer For How Much Time") that counts the days consumers have been shopping there compared to the date that was last recorded.

To further understand the data, we will now investigate the distinctive values in the category characteristics.

Output:

Customer Segmentation Using Machine Learning

We will carry out the following procedures to engineer some new features in the next section:

  • Extract a customer's "Age" from their "Year Birth," which represents the year of their birth.
  • Add a new feature called "Spent" that displays the customer's overall spending across all categories over a two-year period.
  • To separate the living status of couples, create the feature "Living With" from "Marital Status."
  • Create the feature "Children" to show the total number of youngsters and teens living in a home.
  • To further clarify the family, Adding a feature that says "Family Size"
  • Make "Is Parent" a feature to identify whether or not you are a parent.
  • Finally, by streamlining its value counts, we shall divide "Education" into three categories.
  • removing some of the pointless features

Let's have a look at the statistics for the data now that we have some additional features.

Output:

Customer Segmentation Using Machine Learning

The statistics above demonstrate some variations in mean income and age as well as maximum income and age.

Please take note that the maximum age is 128 years, as the data is outdated, and we computed the maximum age to be today (i.e. 2021).

We need to look at the facts from a wider perspective. We will plot a few of the chosen characteristics.

Output:

Relational Script of a Few Selected Features: A subset of data

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

Clearly, the Income and Age characteristics contain a few anomalies. The data's outliers will be eliminated.

Output:

Customer Segmentation Using Machine Learning

Let's now examine the relationships between the characteristics. (At this stage, leaving out the classified characteristics)

Output:

<AxesSubplot: >

Customer Segmentation Using Machine Learning

The new features are present, and the data is rather clean. We'll carry on to the following phase. Specifically, preparing the data.

Data Preprocessing

We will preprocess the data in this part in order to do clustering procedures.

The data is preprocessed using the procedures below:

  • Labeling the category characteristics
  • using the default scaler to scale the features
  • Making a subset dataframe to reduce dimensionality

Output:

Customer Segmentation Using Machine Learning

Output:

Customer Segmentation Using Machine Learning

Output:

Customer Segmentation Using Machine Learning

Output:

Customer Segmentation Using Machine Learning

Dimensionality Reduction

Dimensionality reduction is a technique used in machine learning and data science to reduce the number of features or dimensions in a dataset, while retaining as much information as possible. The goal is to simplify the data while preserving its structure and relationships between variables.

Principal Component Analysis (PCA) is a statistical technique that is used to analyze the structure of complex data sets, such as high-dimensional data sets. It is used to identify patterns in the data, which can then be used to reduce the dimensionality of the data, making it easier to visualize and interpret.

The following actions in this section:

  • PCA-based dimension reduction
  • Graphing the compressed dataframe

Output:

Customer Segmentation Using Machine Learning

Output:

Customer Segmentation Using Machine Learning

Clustering

Agglomerative Clustering will now be used to achieve the Clustering. A hierarchical clustering technique is agglomerative Clustering. Up until the appropriate number of clusters is reached, samples are merged.

The steps in the Clustering:

  • Determine the number of clusters to build using the elbow method.
  • Agglomerative Clustering for Clustering
  • examining the scatter plot clusters that were created

Output:

Customer Segmentation Using Machine Learning

<AxesSubplot: title={'center': 'Distortion Score Elbow for KMeans Clustering'}, xlabel='k', ylabel='distortion score'>

According to the cell above, four clusters will be the best choice for this set of data. To obtain the final clusters, we will then fit the agglomerative clustering model.

Let's look at the clusters' 3-D distribution to investigate the clusters that were produced.

Output:

Customer Segmentation Using Machine Learning

Evaluation Models

Since this Clustering was done without supervision, our model cannot be evaluated or scored since it lacks a tagged feature. This section's goal is to examine the patterns in the clusters that have developed and ascertain their nature.

In order to do that, we will use exploratory data analysis to look at the data in the context of clusters and make judgements.

Output:

Customer Segmentation Using Machine Learning

The clusters seem to be fairly distributed.

Output:

Customer Segmentation Using Machine Learning

The cluster pattern is shown in the income vs expenditure figure.

  • group 0: high expenditures and average earnings
  • group 1: high income and spending
  • group 2: low-income and low expenditure
  • High expenditure and poor income make up group 3.

The specific distribution of clusters according to the different goods in the data will be the subject of our next examination. The following: Wines, Fruits, Meat, Fish, Sweets, and Gold.

Output:

Customer Segmentation Using Machine Learning

It is evident from the plot above that cluster 1 is our largest group of clients, closely followed by cluster 0. We may investigate the focused marketing methods each cluster is investing in.

Next, let's look at how our past campaigns performed.

Output:

Customer Segmentation Using Machine Learning

The campaigns have not yet received a large reaction. generally very few participants. Furthermore, no one portion can include all five of them. Perhaps more well designed and targeted promotions are needed to increase sales.

Output:

Customer Segmentation Using Machine Learning

Campaigns failed, but the transactions were successful. The greatest results came from clusters 0 and 3. Cluster 1, one of our top clients, isn't very interested in the agreements, though. Nothing appears to powerfully draw cluster 2 in.

Output:

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

Profiling

Now that the clusters have been created and their purchasing patterns have been examined. Let's take a look at each individual in these clusters. In order to determine who is our star client and who requires further attention from the retail store's marketing staff, we will be profiling the clusters that have been developed.

To make the decision that, in light of the cluster the client is in, we will be graphing some of the aspects that are indicative of their personal characteristics. We shall get to the conclusions based on the results.

Output:

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

<Figure size 800x550 with 0 Axes>

Customer Segmentation Using Machine Learning

Cluster Number 0:

  • Are definitely a parent
  • At the max, have four members in the family and at least 2
  • Single parents are a subset of this group
  • Most have a teenager at home
  • Relatively older

Cluster Number 1 :

  • Are a definitely not a parent
  • At the max are only two members in the family
  • A slight majority of couples over single people
  • Span all ages
  • A high-income groups

Cluster Number 2:

  • The majority of these people are parents
  • At the max are three members in the family
  • They majorly have one kind(and not teenagers, typically)
  • Relatively younger

Cluster Number 3:

  • They are definitely a parent
  • At the max are five members in the family and at least 2
  • The majority of them have a teenager at home
  • Relatively older
  • A lower-income group

Unsupervised Clustering was done. Dimensionality reduction and agglomerative Clustering were both used. We developed four clusters and utilized them to profile clients in clusters based on their family configurations, income levels, and spending habits. This may be applied to creating better marketing plans.

In conclusion, customer segmentation is a critical aspect of marketing strategy, and machine learning has become an increasingly popular tool for automating the process. By using machine learning algorithms to process vast amounts of customer data, companies can quickly identify new trends and patterns, target specific customer segments with tailored promotions, and make more informed marketing decisions. With its ability to process data in real time, eliminate the need for manual analysis, and continuously improve over time, machine learning is a powerful tool for customer segmentation.