Naive Bayes
- Naive bayes is formed from Bayes Theorem
- Naive bayes is called naive as it assumes all the features are independent of each other, which is really no the case in real life
For example,
Given msg = "Dear Friends", predict if it is Spam or Not-Spam
On the 2nd line, the denominator is ignored because it will be constant over all the data instances
Naive Bayes
Advantages of Naive Bayes?
- Works very well with many number of features
- Works well with large training dataset
- Converges faster
- Lesser overfitting
- Good at Handling Outliers
- Good at Handling Missing Data
Disadvantages of Naive bayes
- Doesn't work if there are correlated features
Impact of missing value on Naive Bayes
Naive Bayes is good at Handling Missing Data as when calculating the probability it ignores the missing value rows, so the missing value has no impact over the probability; hence no impact on naive bayes
Impact of outliers on Naive Bayes
Basic Naive bayes is not good at Handling Outliers as if in the test time there comes a feature which was not in the train set then there will be 0 probability which will make the whole probability to 0, but most of the time these situation are handled by introducing an artificial count to every feature.
Problems that can be solved using Naive Bayes
- Sentiment Analysis
- Spam Classification
- Twitter Sentiment Classification
- Document Categorization