Support Vector Machine (SVM)

SVM stands for Support Vector Machine
Definitions:
SVM can handle non-linear data using Kernel in SVM
- Use Kernel Trick not to project the data all time to save computation
Two types of SVM
1. Hard SVM - No misclassification - Similar to Maximal Margin Classifier with Kernel
2. Soft SVM - Allow misclassification - Similar to SVC with Kernel

Intuition of the math behind SVM

If two support vector from both side are x1 and x2 respectively, then we need to find a line which will maximize (x2 - x1). If the equation of the hyperplane is $y = W X + b$ , then for the support vectors the equation will be

W x_{1} + b = - 1 . . . . (i)

W x_{2} + b = + 1. . . . . (i i)

Subtracting (i) from (ii), we get,

\begin{aligned} W (x_{2} - x_{1}) & = 2 \\ \frac{W}{| | W | |} (x_{2} - x_{1}) & = \frac{2}{| | W | |} \\ (x_{2} - x_{1}) & = \frac{2}{| | W | |} \end{aligned}

As we need to maximize $(x_{2} - x_{1})$ , so it means maximizing $\frac{2}{| | W | |}$ or minimizing $| | W | |$

Equation of SVM

$\begin{aligned} w x_{i} - b \geq + 1 if y_{i} & = + 1 \\ w x_{i} - b \leq - 1 if y_{i} & = - 1 \end{aligned}$

In combination,

y_{i} (w x_{i} - b) \geq 1

Minimize Euclidian Norm $| | w | |$ subject to $y_{i} (w x_{i} - b) \geq 1$

Finally, we need to find $(W, b)$ to minimize $| | W | |$ such that $y_{i} (w x_{i} - b) \geq 1$

Equation of Soft SVM

Soft SVM is used to make the mode more robust and generalized. We allow to have some amount of errors so that the model doesn't overfit the train data.

min | | W | | + c_{i} \sum_{i} | (y_{i} - {\hat{y}}_{i}) |

where, $c_{i}$ is the penalty of the error.

Basic Assumption of SVM

There is no basic assumption of SVM

Advantages of SVM

SVM is more effective for higher dimensional data
Can be used for unstructured data like text, images
With proper kernel function, any problem can be solved
With soft SVM very less chance of overfitting
Very memory efficient, as we only need (W, b) during inference

Disadvantages of SVM

Long training time
It is difficult to choose a good kernel function
Bad at Handling Outliers
Bad at Handling Missing Data
Bad at Handling Imbalanced Dataset

References:

Good to understand the math

Intuition of the math behind SVM

References:

Related Notes