pros
(1) Achieving excellent accuracy, make fast predictions without using a lot of memory.
(2) doesn't require normalization of features.
(3) handle a mixture of feature types. (binary, continuous, categorical types)
This method does have several downsides
(1) difficult for people to interpret.
(2) requires careful tuning of the learning rate and other parameters. 
(3)like decision trees, not recommended for text classification and other problems with very high dimensional sparse features, for accuracy and computational cost reasons.