happens in two ways
- the data used to build each tree is randomly selected (bootstrap sample)
- bootstrap sample: bootstrap sample has N rows just like the original training set but with possibly some rows from the original dataset missing and others occurring multiple times just due to the nature of the random selection with replacement.
- the features chosen in each split tests are also randomly selected
- instead of finding the best split across all possible features, a random subset of features is chosen and the best split is found within that smaller subset of features.
Model Complexity
- n_estimated: no. of trees
- max_features: no. of features in the subset that are randomly considered at each split