Bernoulli Naive Bayes - Machine Learning

In the previous tutorial, we learned about Multinomial Naive Bayes, which works with:

Word Counts
Word Frequencies
Term Frequencies

Example:

free appears 5 times
offer appears 2 times
winner appears 3 times

Multinomial Naive Bayes uses these counts directly.

However, sometimes we do not care how many times a word appears.

We only care whether the word is present or absent.

Example:

Email 1:

free free free offer winner

Email 2:

free offer winner

For Bernoulli Naive Bayes:

Both emails are treated the same.

Because it only checks:

Word Present = 1
Word Absent = 0

This is why Bernoulli Naive Bayes is called a:

Binary Naive Bayes Classifier

Why Do We Need Bernoulli Naive Bayes?

Suppose we want to detect Spam Emails.

Training emails:

Email	Text	Class
E1	free offer	Spam
E2	free winner	Spam
E3	meeting schedule	Not Spam
E4	project meeting	Not Spam

Instead of counting word frequencies:

free = 3
offer = 2
winner = 1

Bernoulli Naive Bayes converts features into:

Present = 1
Absent = 0

Binary Representation

Vocabulary:

free
offer
winner
meeting
schedule
project

Email:

free offer

becomes:

free	offer	winner	meeting	schedule	project
1	1	0	0	0	0

Email:

meeting schedule

becomes:

free	offer	winner	meeting	schedule	project
0	0	0	1	1	0

Core Idea

Bernoulli Naive Bayes calculates:

Probability that a word is present
Probability that a word is absent

This is different from Multinomial Naive Bayes which uses word frequencies.

Example Dataset

Training Data:

Email	Class
free offer	Spam
free winner	Spam
meeting schedule	Not Spam
project meeting	Not Spam

We want to classify:

free winner

Step 1: Calculate Prior Probabilities

Total emails:

Spam emails:

Not Spam emails:

Therefore:

P(Spam) = 2/4 = 0.5

P(Not Spam) = 2/4 = 0.5

Step 2: Create Binary Features

Vocabulary:

free
offer
winner
meeting
schedule
project

Training data becomes:

Email	free	offer	winner	meeting	schedule	project	Class
free offer	1	1	0	0	0	0	Spam
free winner	1	0	1	0	0	0	Spam
meeting schedule	0	0	0	1	1	0	Not Spam
project meeting	0	0	0	1	0	1	Not Spam

Step 3: Calculate Word Probabilities

For Spam emails:

Total Spam emails:

P(free | Spam)

The word "free" appears in both Spam emails.

Count = 2

Using Laplace Smoothing:

P(free|Spam)
=
(count + 1)/(total Spam emails + 2)

=
(2 + 1)/(2 + 2)

=
3/4

=
0.75

P(winner | Spam)

The word "winner" appears in one Spam email.

P(winner|Spam)
=
(1+1)/(2+2)

=
2/4

=
0.5

Step 4: Calculate Absence Probabilities

Bernoulli Naive Bayes also considers missing words.

For example:

P(meeting absent | Spam)

Since:

P(meeting|Spam)=0.25

Therefore:

P(meeting absent|Spam)
=
1 - 0.25

=
0.75

This is a major difference from Multinomial Naive Bayes.

Step 5: Classify New Email

New email:

free winner

Binary representation:

Word	Value
free	1
offer	0
winner	1
meeting	0
schedule	0
project	0

Where:

1 = Word Present
0 = Word Absent

So the email:

free winner

is represented as:

[1, 0, 1, 0, 0, 0]

Step 6: Spam Score

For Spam:

P(Spam)
×
P(free|Spam)
×
P(offer absent|Spam)
×
P(winner|Spam)
×
P(meeting absent|Spam)
×
P(schedule absent|Spam)
×
P(project absent|Spam)

Substituting values:

0.5
×
0.75
×
0.5
×
0.5
×
0.75
×
0.75
×
0.75

≈ 0.0395

Step 7: Not Spam Score

Similarly:

0.5
×
0.25
×
0.75
×
0.25
×
0.75
×
0.5
×
0.5

≈ 0.0044

Step 8: Compare Scores

Class	Score
Spam	0.0395
Not Spam	0.0044

Since:

0.0395 > 0.0044

Prediction:

Spam

Difference Between Multinomial and Bernoulli Naive Bayes

Feature	Multinomial NB	Bernoulli NB
Input	Word Counts	Binary Features
Uses Frequency	Yes	No
Uses Presence/Absence	No	Yes
Suitable For	Text Counts	Binary Text Features
Word Repetition Matters	Yes	No

Example

Document:

free free free offer winner

Multinomial NB sees:

Word	Count
free	3
offer	1
winner	1

Bernoulli NB sees:

Word	Present?
free	1
offer	1
winner	1

Frequency information is ignored.

How Bernoulli Naive Bayes Works

Training Documents
        ↓
Create Vocabulary
        ↓
Convert Words to Binary Features
        ↓
Calculate Prior Probabilities
        ↓
Calculate Presence/Absence Probabilities
        ↓
Apply Bayes Theorem
        ↓
Choose Class with Highest Score

Python Implementation

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB

# Training data
emails = [
    "free offer",
    "free winner",
    "meeting schedule",
    "project meeting"
]

labels = [
    "Spam",
    "Spam",
    "Not Spam",
    "Not Spam"
]

# Binary features
vectorizer = CountVectorizer(binary=True)

X = vectorizer.fit_transform(emails)

# Bernoulli NB model
model = BernoulliNB()

# Train
model.fit(X, labels)

# Test email
test_email = vectorizer.transform(
    ["free winner"]
)

prediction = model.predict(test_email)

print("Prediction:", prediction[0])

Output:

Prediction: Spam

Applications

Spam Detection
Email Filtering
Sentiment Analysis
Document Classification
Binary Text Classification

Advantages

Simple and fast
Works well with binary features
Suitable for small datasets
Handles sparse text data efficiently

Limitations

Ignores word frequency information
Assumes feature independence
May perform worse than Multinomial NB when frequency matters

Important Points

Bernoulli Naive Bayes uses binary features.
Features are represented as 0 (Absent) or 1 (Present).
Word frequency is ignored.
Both presence and absence of words are considered.
Laplace smoothing prevents zero probabilities.
Commonly used for binary text classification problems.
Works best when word occurrence is more important than word frequency.

Keywords

Bernoulli Naive Bayes, Binary Classification, Presence Absence Features, Binary Features, Text Classification, Spam Detection, Laplace Smoothing, Naive Bayes Algorithm, Machine Learning Classification, Bernoulli Distribution