How to find a way to predict who the people that will not pay back a loan received to avoid to give it to them are?
To do that the bank has provided some data about the loan customers and it is requested to build a model based on these past experiences. Given “True” the result of paid, and “False” the defaulted, for the purpose of the project is better the prediction of a false negative, rather than a false positive, that means, it is better to say that someone cannot repay and he can, rather than he can repay and he cannot.
Define any terminology that you will use in the report (for example, model, variable, task, etc.).
Task: given some input and output, the task if what you want to obtain from these. Example: predict who can repay a loan.
Variable/attribute: some characteristics that vary and can be measured.
Value: the quote that a variable can take, usually it is nominal or numeric.
Data/Dataset: a group of measurements of value associated with variables.
Data point: a single example of data.
Numeric variables: the numeric variables could be continuous or discrete. Continuous if could be any number in a range (for example income), discrete if could have only certain values (for example CCJs cannot be 1,5).
Nominal variables: can be quality for input (for example own home) or a label/category for output (for example paid and defaulted).
Model/Data model: it is a representation of the data, built using a data mining technique that has a set of parameters. A model can be interpreted by a software (in this case Weka).
Learning process: it is a process that can perform a specific task, generalizing from a specific dataset to general data.