Credit Card Approval Using Machine Learning

Credit Card Approval Using Machine Learning

Credit scorecards are widely used in the financial industry as a risk control measure. These cards utilize personal information and data provided by credit card applicants to assess the likelihood of potential defaults and credit card debts in the future. Based on this evaluation, the bank can make informed decisions regarding whether to approve the credit card application. Credit scores provide an objective way to measure and quantify the level of risk involved.

Credit card approval is a crucial process in the banking industry. Traditionally, banks rely on manual evaluation of creditworthiness, which can be time-consuming and prone to errors. However, with the advent of Machine Learning (ML) algorithms, the credit card approval process has been significantly streamlined.

Machine Learning algorithms have the ability to analyze large volumes of data and extract patterns, making them invaluable in credit card approval. By training ML models on historical data that includes information about applicants, their financial behavior, and credit history, banks can predict creditworthiness more accurately and efficiently.

Benefits of Credit Card Approval Using Machine Learning

  • Enhanced Accuracy: Machine learning algorithms have the ability to analyze vast amounts of data and identify patterns that may not be apparent to human analysts. By incorporating various data points, including credit history, income, employment, and spending patterns, machine learning models can make more accurate predictions regarding an individual's creditworthiness. This leads to better-informed credit card approval decisions, reducing the risk of defaults and improving overall portfolio performance.
  • Faster Processing: Traditional credit card approval processes can be time-consuming, involving manual reviews, paperwork, and extensive documentation. Machine learning streamlines this process by automating many of the tasks. By leveraging algorithms and predictive models, financial institutions can expedite credit card approvals, providing customers with faster access to credit facilities.
  • Personalized Offerings: Machine learning enables lenders to personalize credit card offerings based on individual profiles and preferences. By analyzing customer data and behavior, machine learning algorithms can identify specific needs, spending patterns, and risk profiles. This allows lenders to tailor credit card features, such as interest rates, credit limits, rewards programs, and promotional offers, to match the unique requirements of each customer.
  • Risk Mitigation: The use of machine learning algorithms in credit card approval helps mitigate risks associated with lending. By accurately assessing creditworthiness and identifying high-risk applicants, financial institutions can make informed decisions on interest rates, credit limits, and terms of repayment. This not only protects lenders from potential losses but also ensures responsible lending practices and safeguards the financial well-being of customers.

Challenges of Credit Card Approval Using Machine Learning

  • Data Privacy and Security: The use of machine learning in credit card approval requires access to vast amounts of sensitive customer data. It is crucial for financial institutions to implement robust data privacy and security measures to protect this information from unauthorized access or misuse. Strict compliance with data protection regulations and encryption techniques is essential to ensure the confidentiality and integrity of customer data.
  • Model Interpretability and Transparency: Machine learning algorithms can be complex, making it challenging to interpret and explain the decisions they make. This lack of interpretability can pose challenges in terms of regulatory compliance and consumer trust. Efforts must be made to develop transparent models that provide clear explanations for credit card approval decisions, ensuring fairness and accountability.
  • Bias and Fairness: Machine learning algorithms are susceptible to bias as they learn from historical data that may contain inherent biases. This can lead to discriminatory practices in credit card approval, impacting certain demographic groups unfairly. It is important to continuously monitor and evaluate machine learning models to ensure fairness and mitigate any bias that may arise.

For better Understanding, we will try to implement it in code, here will try to find whether an applicant is a 'good' or 'bad' client.

Data Definition

There are two .csv files, such as :

1. application_record.csv:

  • ID: A unique identifier for each client.
  • CODE_GENDER: Gender of the client.
  • FLAG_OWN_CAR: Indicates whether the client owns a car.
  • FLAG_OWN_REALTY: Indicates whether the client owns any property.
  • CNT_CHILDREN: Number of children the client has.
  • AMT_INCOME_TOTAL: Annual income of the client.
  • NAME_INCOME_TYPE: Category of the client's income.
  • NAME_EDUCATION_TYPE: Education level of the client.
  • NAME_FAMILY_STATUS: Marital status of the client.
  • NAME_HOUSING_TYPE: Way of living for the client.
  • DAYS_BIRTH: Birthday of the client, represented as the count of days backward from the current day. (0 indicates the current day, and -1 indicates yesterday)
  • DAYS_EMPLOYED: Start date of employment, represented as the count of days backward from the current day. If the value is positive, it means the person is currently unemployed.
  • FLAG_MOBIL: Indicates whether the client has a mobile phone.
  • FLAG_WORK_PHONE: Indicates whether the client has a work phone.
  • FLAG_PHONE: Indicates whether the client has a personal phone.
  • FLAG_EMAIL: Indicates whether the client has an email.
  • OCCUPATION_TYPE: Occupation of the client.
  • CNT_FAM_MEMBERS: The family size of the client.

2. credit_record.csv:

  • ID: A unique identifier for each client.
  • MONTHS_BALANCE: The record month, represented as a count backward from the current month. (0 indicates the current month, -1 indicates the previous month, and so on)
  • STATUS: The status of the client's credit for a particular month. The values range from 0 to 5, where 0 represents 1-29 days past due, 1 represents 30-59 days past due, 2 represents 60-89 days overdue, 3 represents 90-119 days overdue, 4 represents 120-149 days overdue, 5 represents overdue or bad debts for more than 150 days, C represents paid off that month, and X indicates no loan for the month.

Code:

Importing Libraries

Reading the Dataset


Feature Engineering

Here we will aim to extract the most relevant information from the available data and represent it in a way that the machine learning algorithm can effectively learn from it.

Here, we will combine the information from two DataFrames, data and begin_month, based on the 'ID' column. It adds a new column, 'begin_month', to the data DataFrame, indicating the minimum value of 'MONTHS_BALANCE' for each unique 'ID' from the record DataFrame.

Target Variable

Typically, the target risk users are expected to account for approximately 3% of all users. In this case, We have identified users who have overdue payments for more than 60 days as the target risk users. These specific samples are labeled as '1', while the remaining samples are labeled as '0'.

Now we will create the target variable.




Credit Card Approval Using Machine Learning

"No" appears 45,318 times which accounts for approximately 98.55% of the total values.

"Yes" appears 667 times which accounts for approximately 1.45% of the total values.

Features

We will now proceed with the exploratory data analysis of the features, where we will examine, analyze and do various operations on the features.



The ivtable DataFrame will contain the remaining columns from the original DataFrame, excluding the ones specified in namelist

Defining calc_iv function to calculate Information Value and WOE Value


It converts a categorical feature into dummy variables in a DataFrame.

It creates categorical bins based on a numerical column in a DataFrame.

Binary Features

Binary features, also known as binary variables or binary indicators, are categorical variables that can take only two distinct values, typically represented as 0 and 1. These features are used to indicate the presence or absence of a particular characteristic or attribute within the dataset.

We will look for the various binary features and their various properties.

Gender

Output:

Credit Card Approval Using Machine Learning

Having a Car or Not

Output:

Credit Card Approval Using Machine Learning

Having a House Reality or Not

Output:

Credit Card Approval Using Machine Learning

Having a Phone or Not

Output:

Credit Card Approval Using Machine Learning

Having an Email or Not

Output:

Credit Card Approval Using Machine Learning

Having a Work Phone or Not

Output:

Credit Card Approval Using Machine Learning

Continuous Variables

Continuous variables, also known as quantitative or numerical variables, are measurements that can take any value within a specific range. Unlike binary features, which have only two possible values, continuous variables can have an infinite number of possible values within a given interval. Now we will look for the various continuous variables and their properties.

Children Numbers

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

Annual Income

Output:

Credit Card Approval Using Machine Learning
Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning
Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

Working Years

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

Family Size

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

Categorical Features

Categorical features, also known as qualitative or nominal variables, represent characteristics or attributes that fall into distinct categories or groups. Unlike continuous variables, which have a range of numerical values, categorical features have a finite number of discrete values or labels. Now we will look at the various categorical features and their properties.

Income Types

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

House Type

Output:

Credit Card Approval Using Machine Learning

Education

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

IV and WOE

Weight of Evidence(WoE):

woe_i = ln((P(yi) / P(ni)) = ln((yi / ys) / (ni / ns))

Where:

  • woe_i is the WoE for a specific category i.
  • P(yi) is the proportion of "Good" (non-default) observations in category i.
  • P(ni) is the proportion of "Bad" (default) observations in category i.
  • yi is the number of "Good" observations in category i.
  • ys is the total number of "Good" observations.
  • ni is the number of "Bad" observations in category i.
  • ns is the total number of "Bad" observations.

Information Value (IV):

IV = Σ[(Pyi - Pni) * ln(Pyi / Pni)]

Where:

  • Pyi is the proportion of positive samples in category i (number of positive samples in category i divided by the total number of positive samples).
  • Pni is the ratio of negative samples (ni) in category i to the total number of negative samples (ns).

The IV value measures the variable's ability to predict.

Relationship between IV value and predictive power

IV Ability to predict
<0.02 Almost no predictive power
0.02~0.1 weak predictive power
0.1~0.3 Moderate predictive power
0.3~0.5 Strong predictive power
>0.5 Predictive power is too strong, need to check variables

Output:

Credit Card Approval Using Machine Learning

Age Group (agegp) has the highest IV of 0.0659351, indicating a relatively strong predictive power while other variables such as Work Phone (wkphone), Number of Children (ChldNo), Phone (phone), Income Type (inctp), Email (email), Car Ownership (Car), and Occupation Type (occyp) have very low IV values, suggesting they have little or no predictive power.

Output:

Credit Card Approval Using Machine Learning

Splitting the Dataset

Now we will split the dataset into a training and testing set.



Modeling

We will then proceed to train and evaluate different machine learning algorithms, including logistic regression, decision trees, random forests, support vector machines (SVM), and gradient boosting methods. Each algorithm has its own strengths and characteristics, which makes it important to compare their performance and choose the one that best fits our credit card approval prediction task.

1. Logistic Regression

Output:

Credit Card Approval Using Machine Learning

Logistic Regression (LR) achieved an accuracy score of 0.61215. This indicates that the model's ability to correctly predict credit card approval is moderate.

2. Decision Tree

Output:

Credit Card Approval Using Machine Learning

Decision Tree Classifier (DTC) performed better with an accuracy score of 0.82897. This suggests that the model is more effective in capturing the patterns and relationships in the data for credit card approval prediction.

3. Random Forest

Output:

Credit Card Approval Using Machine Learning

Random Forest Classifier (RFC) demonstrated a higher accuracy score of 0.89459. This indicates that the ensemble of decision trees in the random forest model improved the predictive performance compared to the single decision tree.

4. SVM

Output:

Credit Card Approval Using Machine Learning

Support Vector Machines (SVM) had a lower accuracy score of 0.59367, indicating that they may not be as effective in capturing the complexities of the credit card approval prediction task in this case.

5. LightGBM

Output:

Credit Card Approval Using Machine Learning

Light GBM achieved a high accuracy score of 0.90356, suggesting that the gradient boosting algorithm used in this model effectively improved the prediction accuracy compared to the other models.

Output:

Credit Card Approval Using Machine Learning

Output:

Credit Card Approval Using Machine Learning

6. XGBoost

Output:

Credit Card Approval Using Machine Learning

XGBoost performed with a high accuracy score of 0.93789. This indicates that the extreme gradient boosting algorithm employed in XGBoost captured the intricate patterns in the data and made highly accurate predictions for credit card approval.

Output:

Credit Card Approval Using Machine Learning

7. CatBoost

Output:

Credit Card Approval Using Machine Learning

CatBoost, however, achieved a relatively lower accuracy score of 0.50081. This suggests that the model did not perform well in this context and may require further investigation or parameter tuning to improve its predictive capabilities.

XGBoost model exhibited the highest accuracy among the models considered, followed by Light GBM and Random Forest Classifier. These models appear to be more suitable for predicting credit card approval.

Conclusion

Credit card approval using machine learning offers numerous benefits, including enhanced accuracy, faster processing, personalized offerings, and risk mitigation. By leveraging machine learning algorithms, financial institutions can streamline the approval process, provide customized credit card solutions, and make informed lending decisions. However, it is crucial to address challenges related to data privacy, model interpretability, and fairness to ensure responsible and ethical implementation of machine learning in credit card approval. With proper consideration and oversight, machine learning has the potential to revolutionize the lending landscape, benefiting both consumers and lenders alike.






Latest Courses