Bias & Variance

L1 norm
L2 norm
Ensembles

2. Random Forest

Idea
Pros
Random variation happens in two ways
-   bootstrap sample: bootstrap sample has N rows just like the original training set but with possibly some rows from the original dataset missing and others occurring multiple times just due to the nature of the random selection with replacement.
-   instead of finding the best split across all possible features, a random subset of features is chosen and the best split is found within that smaller subset of features. 
Prediction
  1. regression: mean of individual tree predictions
  2.  classification
Pros
Cons
Model Complexity
-  the number of trees (default is 10)should be larger for large datasets 
-  since ensembles that can average over more trees will reduce overfitting but also increase computational cost)
-  the number of features in the subset that are randomly considered at each split. 
-   Learning is senstive to max_features, has a strong effect on performance
-   max_features = 1 → trees will be very different with many levels as can't pick the most informative feature
-   max_features ≈ no. of features → similar trees with fewer levels 
-  controls the depth of each tree (default: None. Split until all leaves contain the same class or all leaves have fewer samples than two by default or )
-    How many cores to use in parallel during training 
-   if you have four cores, the training will be four times as fast as if you just used one.
-   -1: it will use all the cores on your system