What is the business and data understanding?
This phase includes both understandings of the bank business and the bank who needs to understand how machine learning works to avoid misunderstanding in case of an unexpected result.
Before starting it is important that the bank understands that not always there is a solution and even if one is found, there will be a probability that the result for a data point is not corrected. There will be a level of accuracy that the model can receive and the bank needs to decide if it can be accepted. To do that it is necessary to understand the business of the bank and the data the bank provided. It is helpful to know if the customer can provide more data, in case they do not cover all the possible situation or if they are unbalanced (e.g., 80% of them are Unemployed, the result will be that there are more possibilities for an Unemployed person to do not repay the loan).
It could be necessary to talk with an expert of the bank to understand the meaning of the data, to be able to assess which ones can help and which ones need to be excluded. Some data exploration and verification will be done, in the beginning, to get familiar with the data, and to see if they are completed, if there are some input errors or missing value.
The data needs to be prepared; maybe they need some cleaning or a little bit of manipulation to be ready to be submitted to the model. The variables that will be used on the model to reach the data mining goals will be selected, and some considerations about them will be done, in case it is necessary to create some derived attributes from the existing ones.