Random variation happens in two ways
-   bootstrap sample: bootstrap sample has N rows just like the original training set but with possibly some rows from the original dataset missing and others occurring multiple times just due to the nature of the random selection with replacement.
-   instead of finding the best split across all possible features, a random subset of features is chosen and the best split is found within that smaller subset of features. 
Prediction
  1. regression: mean of individual tree predictions
  2.  classification
Pros
Cons
Model Complexity
-  no. of trees (default is 10)should be larger for large datasets 
-  since ensembles that can average over more trees will reduce overfitting but also increase computational cost)
-  no. of features in the subset that are randomly considered at each split. 
-   Learning is senstive to max_features, has a strong effect on performance
-   max_features = 1 →trees will be very different with many levels as cannot pick the most informative feature
-   max_features ≈ no. of features → similar trees with fewer levels 
-  controls the depth of each tree (default: None. Split until all leaves contain the same class or all leaves have fewer samples than two by default or )
-   if you have four cores, the training will be four times as fast as if you just used one.
-   -1: it will use all the cores on your system