Bernoulli Naive Bayes
In the previous tutorial, we learned about Multinomial Naive Bayes, which works with:
Word Counts
Word Frequencies
Term Frequencies
Example:
free appears 5 times
offer appears 2 times
winner appears 3 times
Multinomial Naive Bayes uses these counts directly.
However, sometimes we do not care how many times a word appears.
We only care whether the word is present or absent.
Example:
Email 1:
free free free offer winner
Email 2:
free offer winner
For Bernoulli Naive Bayes:
Both emails are treated the same.
Because it only checks:
Word Present = 1
Word Absent = 0
This is why Bernoulli Naive Bayes is called a:
Binary Naive Bayes Classifier
Why Do We Need Bernoulli Naive Bayes?
Suppose we want to detect Spam Emails.
Training emails:
| Text | Class | |
|---|---|---|
| E1 | free offer | Spam |
| E2 | free winner | Spam |
| E3 | meeting schedule | Not Spam |
| E4 | project meeting | Not Spam |
Instead of counting word frequencies:
free = 3
offer = 2
winner = 1
Bernoulli Naive Bayes converts features into:
Present = 1
Absent = 0
Binary Representation
Vocabulary:
free
offer
winner
meeting
schedule
project
Email:
free offer
becomes:
| free | offer | winner | meeting | schedule | project |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 | 0 |
Email:
meeting schedule
becomes:
| free | offer | winner | meeting | schedule | project |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 1 | 1 | 0 |
Core Idea
Bernoulli Naive Bayes calculates:
Probability that a word is present
Probability that a word is absent
This is different from Multinomial Naive Bayes which uses word frequencies.
Example Dataset
Training Data:
| Class | |
|---|---|
| free offer | Spam |
| free winner | Spam |
| meeting schedule | Not Spam |
| project meeting | Not Spam |
We want to classify:
free winner
Step 1: Calculate Prior Probabilities
Total emails:
4
Spam emails:
2
Not Spam emails:
2
Therefore:
P(Spam) = 2/4 = 0.5
P(Not Spam) = 2/4 = 0.5
Step 2: Create Binary Features
Vocabulary:
free
offer
winner
meeting
schedule
project
Training data becomes:
| free | offer | winner | meeting | schedule | project | Class | |
|---|---|---|---|---|---|---|---|
| free offer | 1 | 1 | 0 | 0 | 0 | 0 | Spam |
| free winner | 1 | 0 | 1 | 0 | 0 | 0 | Spam |
| meeting schedule | 0 | 0 | 0 | 1 | 1 | 0 | Not Spam |
| project meeting | 0 | 0 | 0 | 1 | 0 | 1 | Not Spam |
Step 3: Calculate Word Probabilities
For Spam emails:
Total Spam emails:
2
P(free | Spam)
The word "free" appears in both Spam emails.
Count = 2
Using Laplace Smoothing:
P(free|Spam)
=
(count + 1)/(total Spam emails + 2)
=
(2 + 1)/(2 + 2)
=
3/4
=
0.75
P(winner | Spam)
The word "winner" appears in one Spam email.
P(winner|Spam)
=
(1+1)/(2+2)
=
2/4
=
0.5
Step 4: Calculate Absence Probabilities
Bernoulli Naive Bayes also considers missing words.
For example:
P(meeting absent | Spam)
Since:
P(meeting|Spam)=0.25
Therefore:
P(meeting absent|Spam)
=
1 - 0.25
=
0.75
This is a major difference from Multinomial Naive Bayes.
Step 5: Classify New Email
New email:
free winner
Binary representation:
| Word | Value |
|---|---|
| free | 1 |
| offer | 0 |
| winner | 1 |
| meeting | 0 |
| schedule | 0 |
| project | 0 |
Where:
1 = Word Present
0 = Word Absent
So the email:
free winner
is represented as:
[1, 0, 1, 0, 0, 0]
Step 6: Spam Score
For Spam:
P(Spam)
×
P(free|Spam)
×
P(offer absent|Spam)
×
P(winner|Spam)
×
P(meeting absent|Spam)
×
P(schedule absent|Spam)
×
P(project absent|Spam)
Substituting values:
0.5
×
0.75
×
0.5
×
0.5
×
0.75
×
0.75
×
0.75
≈ 0.0395
Step 7: Not Spam Score
Similarly:
0.5
×
0.25
×
0.75
×
0.25
×
0.75
×
0.5
×
0.5
≈ 0.0044
Step 8: Compare Scores
| Class | Score |
|---|---|
| Spam | 0.0395 |
| Not Spam | 0.0044 |
Since:
0.0395 > 0.0044
Prediction:
Spam
Difference Between Multinomial and Bernoulli Naive Bayes
| Feature | Multinomial NB | Bernoulli NB |
|---|---|---|
| Input | Word Counts | Binary Features |
| Uses Frequency | Yes | No |
| Uses Presence/Absence | No | Yes |
| Suitable For | Text Counts | Binary Text Features |
| Word Repetition Matters | Yes | No |
Example
Document:
free free free offer winner
Multinomial NB sees:
| Word | Count |
|---|---|
| free | 3 |
| offer | 1 |
| winner | 1 |
Bernoulli NB sees:
| Word | Present? |
|---|---|
| free | 1 |
| offer | 1 |
| winner | 1 |
Frequency information is ignored.
How Bernoulli Naive Bayes Works
Training Documents
↓
Create Vocabulary
↓
Convert Words to Binary Features
↓
Calculate Prior Probabilities
↓
Calculate Presence/Absence Probabilities
↓
Apply Bayes Theorem
↓
Choose Class with Highest Score
Python Implementation
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB
# Training data
emails = [
"free offer",
"free winner",
"meeting schedule",
"project meeting"
]
labels = [
"Spam",
"Spam",
"Not Spam",
"Not Spam"
]
# Binary features
vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(emails)
# Bernoulli NB model
model = BernoulliNB()
# Train
model.fit(X, labels)
# Test email
test_email = vectorizer.transform(
["free winner"]
)
prediction = model.predict(test_email)
print("Prediction:", prediction[0])
Output:
Prediction: Spam
Applications
Spam Detection
Email Filtering
Sentiment Analysis
Document Classification
Binary Text Classification
Advantages
-
Simple and fast
-
Works well with binary features
-
Suitable for small datasets
-
Handles sparse text data efficiently
Limitations
-
Ignores word frequency information
-
Assumes feature independence
-
May perform worse than Multinomial NB when frequency matters
Important Points
-
Bernoulli Naive Bayes uses binary features.
-
Features are represented as 0 (Absent) or 1 (Present).
-
Word frequency is ignored.
-
Both presence and absence of words are considered.
-
Laplace smoothing prevents zero probabilities.
-
Commonly used for binary text classification problems.
-
Works best when word occurrence is more important than word frequency.
Keywords
Bernoulli Naive Bayes, Binary Classification, Presence Absence Features, Binary Features, Text Classification, Spam Detection, Laplace Smoothing, Naive Bayes Algorithm, Machine Learning Classification, Bernoulli Distribution