Naive Bayes Classifier - Machine Learning

Naive Bayes is a supervised machine learning algorithm used for classification problems.

It is based on Bayes Theorem, a fundamental concept in probability theory.

The algorithm predicts the class of a data point by calculating probabilities and choosing the class with the highest probability.

In simple words:

Naive Bayes answers the question:

"Given these features, which class is most likely?"

Why is it Called Naive Bayes?

The algorithm assumes that all input features are independent of each other.

For example, suppose we want to predict whether a person buys a car based on:

Age
Income
Education

Naive Bayes assumes:

Age does not affect Income
Income does not affect Education
Education does not affect Age

This assumption is usually not true in real life.

Because of this strong assumption, the algorithm is called:

Naive Bayes

Real-Life Example

Suppose you receive an email.

The email contains words:

Free
Offer
Winner
Prize

You want to classify the email as:

Spam
Not Spam

Naive Bayes calculates:

Probability(Spam | Email)

and

Probability(Not Spam | Email)

Then predicts the class with the higher probability.

Bayes Theorem

Naive Bayes is built on Bayes Theorem.

Where:

Term	Meaning
P(A\|B)	Posterior Probability
P(B\|A)	Likelihood
P(A)	Prior Probability
P(B)	Evidence

Understanding the Formula

Suppose:

A = Spam
B = Word "Free"

Then:

P(A)

Probability that an email is Spam.

Example:

40 out of 100 emails are spam

P(Spam) = 40/100 = 0.4

P(B|A)

Probability that a spam email contains the word "Free".

Suppose:

32 spam emails contain "Free"

P(Free∣Spam) = 32/40 = 0.8

P(A|B)

Probability that an email is spam given that it contains the word "Free".

This is what we want to calculate.

Working Example

Suppose we have:

Email Type	Count
Spam	40
Not Spam	60

Total Emails:

Step 1: Calculate Prior Probabilities

Spam

P(Spam)=40/100=0.4

Not Spam

P(NotSpam)=60/100=0.6

Step 2: Calculate Likelihood

Suppose:

Word	Spam Emails
Free	32

P(Free∣Spam)=32/40=0.8

Suppose:

Word	Not Spam Emails
Free	6

P(Free∣NotSpam)=6/60=0.1

Step 3: Apply Bayes Theorem

We compare probabilities for each class.

For Spam:

P(Spam∣Free)∝P(Free∣Spam)×P(Spam)

Substitute values:

0.8×0.4=0.32

For Not Spam:

P(NotSpam∣Free)∝P(Free∣NotSpam)×P(NotSpam)

0.1×0.6=0.06

Step 4: Compare Probabilities

Class	Probability
Spam	0.32
Not Spam	0.06

Since:

0.32 > 0.06

Prediction:

Spam

Another Example Using Student Data

Suppose we want to predict whether a student will pass an exam.

Training Data:

Study Hours	Result
High	Pass
High	Pass
Medium	Pass
Low	Fail
Low	Fail

Step 1: Prior Probability

Pass:

P(Pass)=3/5=0.6

Fail:

P(Fail)=2/5=0.4

Step 2: Likelihood

For a student with:

Study Hours = High

Likelihood:

P(High∣Pass)=2/3=0.667

P(High∣Fail)=0/2=0

Step 3: Calculate Posterior

Pass:

0.667×0.6=0.4002

Fail:

0×0.40=0

Prediction:

Pass

How Naive Bayes Makes Predictions

Training Data
       ↓
Calculate Prior Probabilities
       ↓
Calculate Likelihood Probabilities
       ↓
Apply Bayes Theorem
       ↓
Compute Posterior Probabilities
       ↓
Select Highest Probability Class

Why Naive Bayes Works Well

Even though the independence assumption is often incorrect:

Feature Independence Assumption

Naive Bayes still performs surprisingly well because:

Probability calculations are simple
Less data is required
Fast training
Fast prediction

Advantages

Easy to understand
Fast training
Fast prediction
Works well with small datasets
Excellent for text classification
Handles high-dimensional data efficiently

Limitations

Assumes feature independence
Probability estimates may not be accurate
Can struggle with highly correlated features

Applications

Spam Detection

Spam / Not Spam

Sentiment Analysis

Positive / Negative

Document Classification

Sports
Politics
Technology
Business

Medical Diagnosis

Disease Prediction

Python Implementation

Example 1: Simple Naive Bayes Classification

from sklearn.naive_bayes import GaussianNB

# Training Data
X = [
    [1],
    [2],
    [3],
    [8],
    [9],
    [10]
]

y = [
    "Fail",
    "Fail",
    "Fail",
    "Pass",
    "Pass",
    "Pass"
]

# Create Model
model = GaussianNB()

# Train Model
model.fit(X, y)

# Predict
prediction = model.predict([[7]])

print("Prediction:", prediction[0])

Output:

Prediction: Pass

Example 2: Email Spam Classification

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

emails = [
    "Free prize winner",
    "Claim your free offer",
    "Meeting at 10 AM",
    "Project discussion tomorrow"
]

labels = [
    "Spam",
    "Spam",
    "Not Spam",
    "Not Spam"
]

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(emails)

model = MultinomialNB()

model.fit(X, labels)

test_email = vectorizer.transform(
    ["Free offer available"]
)

prediction = model.predict(test_email)

print(prediction[0])

Output:

Spam

Important Points

Naive Bayes is a probabilistic classification algorithm.
It is based on Bayes Theorem.
It assumes all features are independent.
Prediction is based on posterior probabilities.
The class with the highest probability is selected.
It is extremely fast and memory efficient.
Widely used in spam filtering and text classification.
It works surprisingly well despite its naive assumption.

Keywords

Naive Bayes Classifier, Bayes Theorem, Probabilistic Classification, Prior Probability, Posterior Probability, Likelihood, Spam Detection, Machine Learning Classification, Bayesian Learning, Feature Independence Assumption