Now we consider the case where the
problem is linearly separable. Even in this simple case, the choice of
separating hyperplane is not obvious. Indeed, there is an infinity of
hyperplanes separators, the learning performance is identical (the empirical
risk is the same), but the generalization performance can be very different. To
resolve this problem, it has been shown that there exists a unique optimal
hyperplane defined as the hyperplane that maximizes the margin between the
samples and the separating hyperplane.
There are
theoretical reasons for this choice. Vapnik showed that the capacity class of
hyperplanes separators decreases when the margin increases.