Each leaf contains two number: one represents how many training examples and the other how many training errors. The second one is a feedforward neural network algorithm called multilayer perceptron in Weka. In this algorithm, there are some hidden layers (otherwise it would be equivalent to a logistic regression) that operates from the output to the input (backpropagation). Each layer applies an activation function (theoretically it could be a different function for each node. However practically all the hidden units have the same) to establish the weight of the branch to try to find a set of weights...
Each leaf contains two number: one represents how many training examples and the other how many training errors.
The second one is a feedforward neural network algorithm called multilayer perceptron in Weka. In this algorithm, there are some hidden layers (otherwise it would be equivalent to a logistic regression) that operates from the output to the input (backpropagation). Each layer applies an activation function (theoretically it could be a different function for each node. However practically all the hidden units have the same) to establish the weight of the branch to try to find a set of weights that has the most possibility to produce correct output for an input.
These are the hyperparameters that were being tested:
Hidden layers: it configures the number of unit per hidden layers and the number of hidden layers that the algorithm will use. Choosing a small number of layers could mean to underfit the model, choosing a big number could risk to overfitting it and increase the time to build the model.
Learning rate: it is a value that represents the size of step the algorithm makes down the gradient on the error curve. Increasing makes the learning process faster as fewer steps are necessary however it would be less accurate; although decreasing makes the learning process slower, it will reach the best point in a better way. If the best point is 5.235 and the learning rate is 0.1, the algorithm can reach 5.2 or 5.3, if the learning rate is 0.01 it can reach 5.23 or 5.24.
Momentum: it is a value between 0 and one that try to jump a local minimum (that is a local solution that seems the best as not a lot exploration had been made) by increasing the size of the descent till the minimum.