How can modeling help in data understanding while working on banking projects?
It could be necessary to talk with an expert of the bank to understand the meaning of the data, to be able to assess which ones can help and which ones need to be excluded. Some data exploration and verification will be done, in the beginning, in order to get familiar with the data, and to see if they are completed, if there are some input errors or missing value.
The data needs to be prepared; maybe they need some cleaning or a little bit of manipulation to be ready to be submitted to the model. The variables that will be used on the model to reach the data mining goals will be selected and some considerations about them will be done, in case it is necessary to create some derived attributes from the existing ones.
The model will be built with Weka: a machine learning software that contains various algorithm. A decision tree (J48) and a Multilayer Perceptron are two different kinds of algorithms that will be used.
The decision tree is a tree-like model that works best on nominal attributes, so it may be necessary to convert the Income into bins, rather than having it as a continuous numeric variable.
The Multilayer Perceptron classifies instances using a backpropagation.
Decision tree and Multilayer Perceptron are machine learning techniques, and the name is usually referred to the way the model is represented.
In the evaluation phase, the accuracy and generality of the model will be discovered so that it can be compared with the one expected by the customer. If the bank is not satisfied, it is possible to go back to the Business Understanding, find the reasons why the model is deficient, collect more data and try again.
Once the model is ready, and it has been evaluated, if the bank is satisfied, it needs to be deployed to make it working in predicting for future customers. In this phase, it is necessary to anticipate any changing in the data in order to avoid to use this model incorrectly because of the changing input. To do that it could be helpful to build a second model that needs to check if the data have the same pattern of the previous ones, otherwise, we need to build a new model.