PySpark-Bank-Churn

Surname: corresponds to the record (row) number and has no effect on the output.
CreditScore: contains random values and has no effect on customer leaving the bank.
Geography: a customer’s location can affect their decision to leave the bank.
Gender: it’s interesting to explore whether gender plays a role in a customer leaving the bank.
Age: this is certainly relevant, since older customers are less likely to leave their bank than younger ones.
Tenure: refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank.
NumOfProducts: refers to the number of products that a customer has purchased through the bank.
HasCrCard: denotes whether or not a customer has a credit card. This column is also relevant, since people with a credit card are less likely to leave the bank.
IsActiveMember: active customers are less likely to leave the bank.
EstimatedSalary: as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries.
Exited: (Dependent Variable): whether or not the customer left the bank.
Balance:also a very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances.

Acknowledgements

As we know, it is much more expensive to sign in a new client than keeping an existing one.

It is advantageous for banks to know what leads a client towards the decision to leave the company.

Churn prevention allows companies to develop loyalty programs and retention campaigns to keep as many customers as possible.