In the tests, the number of bins has been memorized when the variables have been discretized. There were some variables that were not used by the algorithm. Some of them (CustomerID, Fictional Surname, Country, Postcode) were being already removed from the dataset. To try to reduce further the others, to achieve a more general model, a backward selection method is being used. This means starting with all the variables and removing some during the search....
In the tests, the number of bins has been memorized when the variables have been discretized.
There were some variables that were not used by the algorithm. Some of them (CustomerID, Fictional Surname, Country, Postcode) were being already removed from the dataset.
To try to reduce further the others, to achieve a more general model, a backward selection method is being used. This means starting with all the variables and removing some during the search. As not all attributes are used in all models, it has been memorized for each model which variables has not been used, and at the end, some tests have been made without these variables to have a more general model.
Most of the models did not use Gender, Loan amount and Region that has been excluded as they do not seem to be helpful. Age and Year at address have been excluded because the different in the result was not significantly, and this can help the model to be more general. In this way, a technique called pruning has been adopted that means removing a node that has not a big impact in the models to avoid over-fitting.
To do the test, one hyperparameter at the time has been considered and set it until the result was satisfactory, then another hyperparameter has been considered.