Bank customer churn prediction using ANN

Rohan Paris
8 min readMay 30, 2022

--

Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers. A high churn means that a higher number of customers no longer want to purchase goods and services from the business. The customer churn rate can be calculated by dividing the total number of customers who have left the service by the total number of active customers present at the start of the period. For example, if you got 1000 customers and lost 50 last month, your monthly churn rate is 5 percent.

Building a customer churn predictive model aims to retain customers at the highest risk of churn by proactively engaging with them. For example: Offer a gift voucher or any promotional pricing and lock them in for an additional year or two to extend their lifetime value to the company.

Table of contents:
1. The Project goal
2. Dataset description and initial analysis
3. EDA
4. Data preprocessing
5. Model building using ANN
6. Test data predictions and performance metrics
7. Conclusion

1. The Project goal

This project aims to get familiar with the deep learning concepts and apply them to the ‘Churn for Bank Customer’ dataset from Kaggle. We will predict the bank customers who are most likely to be churned using the Artificial Neural Network (ANN).

2. Dataset description and initial analysis

2.1. Data collection

The dataset can be downloaded from Kaggle using this link.

2.2. Data description:

The dataset contains 1000 records and 13 features with no missing values.

  • RowNumber — corresponds to the record (row) number and has no effect on the output.
  • CustomerId — contains random values and has no effect on customers leaving the bank.
  • Surname — the surname of a customer has no impact on their decision to leave the bank.
  • CreditScore — can have an effect on customer churn, since a customer with a higher credit score is less likely to leave the bank.
  • Geography — a customer’s location can affect their decision to leave the bank.
  • Gender — it’s interesting to explore whether gender plays a role in a customer leaving the bank.
  • Age — this is certainly relevant since older customers are less likely to leave their bank than younger ones.
  • Tenure — refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank.
  • Balance — A very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances.
  • NumOfProducts — refers to the number of products that a customer has purchased through the bank.
  • HasCrCard — denotes whether or not a customer has a credit card. This column is also relevant since people with a credit card are less likely to leave the bank.
  • IsActiveMember — active customers are less likely to leave the bank.
  • EstimatedSalary — as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries.
  • Exited — whether or not the customer left the bank- This is the target feature — 0: Not churned, 1: Churned

2.3. Libraries used:

#Computing libraries
import pandas as pd
import numpy as np

#Visualizations library
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use("seaborn-whitegrid")
%matplotlib inline

#Model building libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

#Deep learning libraries
import tensorflow as tf

#DL library to use forward and backward propogation
from tensorflow.keras.models import Sequential

#DL library to build input/hidden/output layers
from tensorflow.keras.layers import Dense

#DL library to prevent overfitting
from tensorflow.keras.layers import Dropout

#DL library to use activation function
from tensorflow.keras.layers import LeakyReLU, ReLU

#DL library to use optimizer
from tensorflow.keras.optimizers import Adam

#Performance metrics
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

import warnings
warnings.filterwarnings('ignore')

3. EDA

This is an imbalanced dataset as the number of non-churned customers is greater than the number of churned customers.

distribution of target feature

The number of male customers is slightly greater than the number of female customers for the given bank.

distribution of target feature using gender

Around 50% of the customers are from the France region and the number of customers from the Spain and Germany region is almost equal.

4. Data preprocessing

Geography and Gender features contain categorical values. These values are not ordinal so we will apply one-hot encoding to these features.

geog = pd.get_dummies(df_data['Geography'], drop_first=True)
geog.head()
gen = pd.get_dummies(df_data['Gender'], drop_first=True)
gen.head()

5. Model building using ANN

We will start by first separating the dependent and independent features.

#Seperate independent and dependent features
X = df_data.loc[:, df_data.columns!='Exited']
y = df_data['Exited']

Next, we will split the data for training and validation.

# Break off validation set from training data
X_train, X_valid, y_train, y_valid = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=0)
# summarize
print('Train', X_train.shape, y_train.shape)
print('Test', X_valid.shape, y_valid.shape)

Feature Scaling is required in ANN as it uses optimizers. So we will scale down the data using the standard scaler. We apply ‘fit_transform’ only on the training data and ‘transform’ on the validation data to prevent data leakage.

# Feature Scaling
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_valid_sc = scaler.transform(X_valid)

Once our data is ready, the next step is to build a neural network. In this experiment, we will build a neural network with one input layer, two hidden layers, and one output layer.

sample neural network

Defining ANN:

  • Input Layer — As the training data has 11 features, the input layer will have 11 neurons.
  • Hidden Layer — This depends on the trial. I have chosen 3 hidden layers.
  • Output Layer — As this is a binary classification problem a single neuron will work in the output layer.
#Initialise ANN
classifier = Sequential()

#Add input layer
classifier.add(Dense(units=11,activation='relu'))

#Add first hidden layer
classifier.add(Dense(units=7, activation='relu'))
classifier.add(Dropout(0.2))

#Add second hidden layer
classifier.add(Dense(units=7, activation='relu'))
classifier.add(Dropout(0.2))
#Add third hidden layer
classifier.add(Dense(units=7, activation='relu'))
classifier.add(Dropout(0.2))
#Add output layer
classifier.add(Dense(units=1, activation='sigmoid'))

The activation function helps to determine the output of a neural network. These type of functions are attached to each neuron in the network and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.

The sigmoid function is used on the output layer for a binary classification problem as it gives an output of either 0 or 1.

The ReLU (Rectified Linear Unit) function is used as an activation function on the hidden layers. It uses the equation max(0,z) which means it generates the maximum value between 0 and z

The dropout function is used to prevent overfitting. It deactivates some of the neurons in the given hidden layer.

#optimizer
opt = Adam(learning_rate=0.01)

#compile
classifier.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])

Optimizer is used to update the weights and bias during the back-propagation. The Adam optimizer uses noise smoothening and adaptive learning rate.

The binary cross-entropy loss function is used for binary classification problems. Here y is the actual value and ŷ is the predicted value.

#Early stopping
early_stop = tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
min_delta=0.0001,
patience=20,
verbose=1,
mode='auto',
baseline=None,
restore_best_weights=False,
)

Early stopping is used to stop the Neural Network if there is no significant improvement in the model’s accuracy. The parameters used are as follows-

  • monitor — Quantity to be monitored
  • min_delta — Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
  • patience — Number of epochs with no improvement after which training will be stopped.
  • verbose —Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action.
  • mode — One of {“auto”, “min”, “max”}. In min mode, training will stop when the quantity monitored has stopped decreasing; in “max” mode it will stop when the quantity monitored has stopped increasing; in “auto” mode, the direction is automatically inferred from the name of the monitored quantity.
#Fit the model
model_history = classifier.fit(X_train_sc, y_train, validation_split=0.33, batch_size=10, epochs=1000, callbacks=early_stop)
model_history.history.keys()
#Summarize history for accuracy
plt.plot(model_history.history['accuracy'])
plt.plot(model_history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
#Summarize history for loss
plt.plot(model_history.history['loss'])
plt.plot(model_history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')

6. Test data predictions and performance metrics

Predictions are made on the test data and predictions greater than 0.5 are classified as customers who are likely to be churned and below 0.5 are classified as satisfied customers.

The model scores an accuracy of 85%

print(classification_report(y_valid, y_pred))

7. Conclusion

In this post, we have seen how to build an ANN and implemented it to predict the bank’s customers who are likely to be churned.

The future scope for this project will be to reduce the number of ‘False Negative’ as these are those customers who are actually likely to be churned but our model classified them as satisfied customers.

Please find the code used in this project from the below GitHub link

--

--

Rohan Paris
Rohan Paris

Responses (1)