Now we consider the case where the problem is linearly separable. Even in this simple case, the choice of separating hyperplane is not obvious. Indeed, there is an infinity of hyperplanes separators, the learning performance is identical (the empirical risk is the same), but the generalization performance can be very different. To resolve this problem, it has been shown that there exists a unique optimal hyperplane defined as the hyperplane that maximizes the margin between the samples and the separating hyperplane.  

             There are theoretical reasons for this choice. Vapnik showed that the capacity class of hyperplanes separators decreases when the margin increases.