- using the model to make prediction only requires modest memory and is fast
- doesn't require careful features normalization to perform well.
- handle a mixture of feature types. (binary, continuous, categorical types)
cons
- the models are difficult for people to interpret.
- requires careful tuning of the learning rate and other parameters.
- the training process can require significant computation
- like decision trees, not recommended for text classification and other problems with very high dimensional sparse features, for accuracy and computational cost reasons.