Bernoulli Naive Bayes

In the previous tutorial, we learned about Multinomial Naive Bayes, which works with:

Word Counts
Word Frequencies
Term Frequencies

Example:

free appears 5 times
offer appears 2 times
winner appears 3 times

Multinomial Naive Bayes uses these counts directly.

However, sometimes we do not care how many times a word appears.

We only care whether the word is present or absent.

Example:

Email 1:

free free free offer winner

Email 2:

free offer winner

For Bernoulli Naive Bayes:

Both emails are treated the same.

Because it only checks:

Word Present = 1
Word Absent = 0

This is why Bernoulli Naive Bayes is called a:

Binary Naive Bayes Classifier

Why Do We Need Bernoulli Naive Bayes?

Suppose we want to detect Spam Emails.

Training emails:

Email Text Class
E1 free offer Spam
E2 free winner Spam
E3 meeting schedule Not Spam
E4 project meeting Not Spam

Instead of counting word frequencies:

free = 3
offer = 2
winner = 1

Bernoulli Naive Bayes converts features into:

Present = 1
Absent = 0

Binary Representation

Vocabulary:

free
offer
winner
meeting
schedule
project

Email:

free offer

becomes:

free offer winner meeting schedule project
1 1 0 0 0 0

Email:

meeting schedule

becomes:

free offer winner meeting schedule project
0 0 0 1 1 0

Core Idea

Bernoulli Naive Bayes calculates:

Probability that a word is present
Probability that a word is absent

This is different from Multinomial Naive Bayes which uses word frequencies.

Example Dataset

Training Data:

Email Class
free offer Spam
free winner Spam
meeting schedule Not Spam
project meeting Not Spam

We want to classify:

free winner

Step 1: Calculate Prior Probabilities

Total emails:

4

Spam emails:

2

Not Spam emails:

2

Therefore:

P(Spam) = 2/4 = 0.5
P(Not Spam) = 2/4 = 0.5

Step 2: Create Binary Features

Vocabulary:

free
offer
winner
meeting
schedule
project

Training data becomes:

Email free offer winner meeting schedule project Class
free offer 1 1 0 0 0 0 Spam
free winner 1 0 1 0 0 0 Spam
meeting schedule 0 0 0 1 1 0 Not Spam
project meeting 0 0 0 1 0 1 Not Spam

Step 3: Calculate Word Probabilities

For Spam emails:

Total Spam emails:

2

P(free | Spam)

The word "free" appears in both Spam emails.

Count = 2

Using Laplace Smoothing:

P(free|Spam)
=
(count + 1)/(total Spam emails + 2)
=
(2 + 1)/(2 + 2)
=
3/4
=
0.75

P(winner | Spam)

The word "winner" appears in one Spam email.

P(winner|Spam)
=
(1+1)/(2+2)
=
2/4
=
0.5

Step 4: Calculate Absence Probabilities

Bernoulli Naive Bayes also considers missing words.

For example:

P(meeting absent | Spam)

Since:

P(meeting|Spam)=0.25

Therefore:

P(meeting absent|Spam)
=
1 - 0.25
=
0.75

This is a major difference from Multinomial Naive Bayes.

Step 5: Classify New Email

New email:

free winner

Binary representation:

Word Value
free 1
offer 0
winner 1
meeting 0
schedule 0
project 0

Where:

1 = Word Present
0 = Word Absent

So the email:

free winner

is represented as:

[1, 0, 1, 0, 0, 0]

Step 6: Spam Score

For Spam:

P(Spam)
×
P(free|Spam)
×
P(offer absent|Spam)
×
P(winner|Spam)
×
P(meeting absent|Spam)
×
P(schedule absent|Spam)
×
P(project absent|Spam)

Substituting values:

0.5
×
0.75
×
0.5
×
0.5
×
0.75
×
0.75
×
0.75
≈ 0.0395

Step 7: Not Spam Score

Similarly:

0.5
×
0.25
×
0.75
×
0.25
×
0.75
×
0.5
×
0.5
≈ 0.0044

Step 8: Compare Scores

Class Score
Spam 0.0395
Not Spam 0.0044

Since:

0.0395 > 0.0044

Prediction:

Spam

Difference Between Multinomial and Bernoulli Naive Bayes

Feature Multinomial NB Bernoulli NB
Input Word Counts Binary Features
Uses Frequency Yes No
Uses Presence/Absence No Yes
Suitable For Text Counts Binary Text Features
Word Repetition Matters Yes No

Example

Document:

free free free offer winner

Multinomial NB sees:

Word Count
free 3
offer 1
winner 1

Bernoulli NB sees:

Word Present?
free 1
offer 1
winner 1

Frequency information is ignored.

How Bernoulli Naive Bayes Works

Training Documents

Create Vocabulary

Convert Words to Binary Features

Calculate Prior Probabilities

Calculate Presence/Absence Probabilities

Apply Bayes Theorem

Choose Class with Highest Score

Python Implementation

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB

# Training data
emails = [
"free offer",
"free winner",
"meeting schedule",
"project meeting"
]

labels = [
"Spam",
"Spam",
"Not Spam",
"Not Spam"
]

# Binary features
vectorizer = CountVectorizer(binary=True)

X = vectorizer.fit_transform(emails)

# Bernoulli NB model
model = BernoulliNB()

# Train
model.fit(X, labels)

# Test email
test_email = vectorizer.transform(
["free winner"]
)

prediction = model.predict(test_email)

print("Prediction:", prediction[0])

Output:

Prediction: Spam

Applications

Spam Detection
Email Filtering
Sentiment Analysis
Document Classification
Binary Text Classification

Advantages

  • Simple and fast

  • Works well with binary features

  • Suitable for small datasets

  • Handles sparse text data efficiently

Limitations

  • Ignores word frequency information

  • Assumes feature independence

  • May perform worse than Multinomial NB when frequency matters

Important Points

  • Bernoulli Naive Bayes uses binary features.

  • Features are represented as 0 (Absent) or 1 (Present).

  • Word frequency is ignored.

  • Both presence and absence of words are considered.

  • Laplace smoothing prevents zero probabilities.

  • Commonly used for binary text classification problems.

  • Works best when word occurrence is more important than word frequency.

Keywords

Bernoulli Naive Bayes, Binary Classification, Presence Absence Features, Binary Features, Text Classification, Spam Detection, Laplace Smoothing, Naive Bayes Algorithm, Machine Learning Classification, Bernoulli Distribution

Previous Topic Multinomial Naive Bayes Next Topic Bagging and Boosting