Understanding Support Vector Machines: Large Margins and Kernels
Support Vector Machines (SVMs) offer a powerful approach to classification, often providing cleaner and more effective solutions than logistic regression or neural networks, especially for complex non-linear problems.
Large Margin Classification
SVMs can be viewed as large-margin classifiers. The core idea is to find a decision boundary that not only separates the classes but also maximizes the distance to the nearest training examples. This margin is crucial for generalization.
Optimization Objective
We can derive the SVM optimization objective by modifying the cost function of logistic regression. Instead of the standard logistic regression cost, SVM uses a modified cost function:
$$ J( heta) = C \sum_{i=1}^m \left[ y^{(i)}\ cost_1(\theta^T x^{(i)}) + (1 - y^{(i)})\ cost_0(\theta^T x^{(i)}) \right] + \frac{1}{2} \sum_{j=1}^n \theta_j^2 $$
Here, $C$ is a regularization parameter (inversely related to $\lambda$ in logistic regression), and the $cost$ functions are adjusted to penalize misclassificasions differently. The hypothesis for SVM is simplified: predict 1 if $\theta^T x \ge 0$, and 0 otherwise.
Large Margin Intuition
Imagine trying to separate two groups of points with a line. A large-margin classifier aims to find the widest possible