Bayes : Examples
Example 1: Spam Email Detection
Problem Statement
A company has analyzed 1000 emails.
-
400 emails are Spam
-
600 emails are Not Spam
The word "Offer" appears:
-
In 280 Spam emails
-
In 120 Not Spam emails
A new email arrives containing the word "Offer".
Predict whether the email is Spam or Not Spam using Bayes Classification.
Step 1: Calculate Prior ProbabilitiesP(Spam) = 400 / 1000
= 0.4
P(NotSpam) = 600 / 1000
= 0.6
Step 2: Calculate Likelihoods
P(Offer | Spam) = 280 / 400
= 0.7
P(Offer | NotSpam) = 120 / 600
= 0.2
Step 3: Calculate Scores
Spam Score
= P(Offer | Spam) × P(Spam)
= 0.7 × 0.4
= 0.28
NotSpam Score
= P(Offer | NotSpam) × P(NotSpam)
= 0.2 × 0.6
= 0.12
Step 4: Compare Scores
Spam Score = 0.28
NotSpam Score = 0.12
0.28 > 0.12
Prediction
Spam
Example 2: Disease Diagnosis
Problem Statement
In a hospital:
-
300 patients have Flu
-
700 patients do not have Flu
Among Flu patients:
-
240 have Fever
Among Non-Flu patients:
-
70 have Fever
A patient arrives with Fever.
Predict whether the patient has Flu.
Step 1: Prior Probabilities
P(Flu) = 300 / 1000
= 0.3
P(NoFlu) = 700 / 1000
= 0.7
Step 2: Likelihoods
P(Fever | Flu) = 240 / 300
= 0.8
P(Fever | NoFlu) = 70 / 700
= 0.1
Step 3: Calculate Scores
Flu Score
= P(Fever | Flu) × P(Flu)
= 0.8 × 0.3
= 0.24
NoFlu Score
= P(Fever | NoFlu) × P(NoFlu)
= 0.1 × 0.7
= 0.07
Step 4: Compare Scores
Flu Score = 0.24
NoFlu Score = 0.07
0.24 > 0.07
Prediction
Patient has Flu
Example 3: Student Pass Prediction
Problem Statement
A university collected data from 1000 students.
-
700 students passed
-
300 students failed
Among students who passed:
-
630 studied more than 5 hours daily
Among students who failed:
-
60 studied more than 5 hours daily
A new student studies more than 5 hours daily.
Predict whether the student will Pass or Fail.
Step 1: Prior Probabilities
P(Pass) = 700 / 1000
= 0.7
P(Fail) = 300 / 1000
= 0.3
Step 2: Likelihoods
P(Study | Pass) = 630 / 700
= 0.9
P(Study | Fail) = 60 / 300
= 0.2
Step 3: Calculate Scores
Pass Score
= P(Study | Pass) × P(Pass)
= 0.9 × 0.7
= 0.63
Fail Score
= P(Study | Fail) × P(Fail)
= 0.2 × 0.3
= 0.06
Step 4: Compare Scores
Pass Score = 0.63
Fail Score = 0.06
0.63 > 0.06
Prediction
Pass
Example 4: Weather Prediction
Problem Statement
Historical records show:
-
400 rainy days
-
600 non-rainy days
Among rainy days:
-
300 were cloudy
Among non-rainy days:
-
180 were cloudy
Today is cloudy.
Predict whether it will rain.
Step 1: Prior Probabilities
P(Rain) = 400 / 1000
= 0.4
P(NoRain) = 600 / 1000
= 0.6
Step 2: Likelihoods
P(Cloudy | Rain) = 300 / 400
= 0.75
P(Cloudy | NoRain) = 180 / 600
= 0.30
Step 3: Calculate Scores
Rain Score
= P(Cloudy | Rain) × P(Rain)
= 0.75 × 0.4
= 0.30
NoRain Score
= P(Cloudy | NoRain) × P(NoRain)
= 0.30 × 0.6
= 0.18
Step 4: Compare Scores
Rain Score = 0.30
NoRain Score = 0.18
0.30 > 0.18
Prediction
Rain
Example 5: Loan Approval
Problem Statement
A bank has data for 1000 customers.
-
800 loans approved
-
200 loans rejected
Among approved customers:
-
560 have high salaries
Among rejected customers:
-
40 have high salaries
A new customer has a high salary.
Predict whether the loan should be approved.
Step 1: Prior Probabilities
P(Approve) = 800 / 1000
= 0.8
P(Reject) = 200 / 1000
= 0.2
Step 2: Likelihoods
P(HighSalary | Approve) = 560 / 800
= 0.7
P(HighSalary | Reject) = 40 / 200
= 0.2
Step 3: Calculate Scores
Approve Score
= P(HighSalary | Approve) × P(Approve)
= 0.7 × 0.8
= 0.56
Reject Score
= P(HighSalary | Reject) × P(Reject)
= 0.2 × 0.2
= 0.04
Step 4: Compare Scores
Approve Score = 0.56
Reject Score = 0.04
0.56 > 0.04
Prediction
Loan Approved
Example 6: Naive Bayes with Multiple Features
Problem Statement
An email contains two words:
-
Offer
-
Free
Given:
P(Spam) = 0.4
P(NotSpam) = 0.6
P(Offer | Spam) = 0.7
P(Free | Spam) = 0.8
P(Offer | NotSpam) = 0.2
P(Free | NotSpam) = 0.1
Predict whether the email is Spam.
Step 1: Calculate Spam Score
Spam Score
= P(Spam)
× P(Offer | Spam)
× P(Free | Spam)
= 0.4 × 0.7 × 0.8
= 0.224
Step 2: Calculate NotSpam Score
NotSpam Score
= P(NotSpam)
× P(Offer | NotSpam)
× P(Free | NotSpam)
= 0.6 × 0.2 × 0.1
= 0.012
Step 3: Compare Scores
Spam Score = 0.224
NotSpam Score = 0.012
0.224 > 0.012
Prediction
Spam
Example 7: Disease Diagnosis — Prediction is No Flu
Problem StatementIn a hospital:- 300 patients have Flu
- 700 patients do not have Flu
- 60 have Body Pain
- 350 have Body Pain
P(Flu) = 300 / 1000Step 2: Likelihoods
= 0.3
P(NoFlu) = 700 / 1000
= 0.7
P(BodyPain | Flu) = 60 / 300Step 3: Calculate Scores
= 0.2
P(BodyPain | NoFlu) = 350 / 700
= 0.5
Flu Score
= P(BodyPain | Flu) × P(Flu)
= 0.2 × 0.3
= 0.06
NoFlu ScoreStep 4: Compare Scores
= P(BodyPain | NoFlu) × P(NoFlu)
= 0.5 × 0.7
= 0.35
Flu Score = 0.06
NoFlu Score = 0.35
0.35 > 0.06
Prediction
No Flu Even though the patient has Body Pain, Body Pain appears more often among patients who do not have Flu in this dataset.So the Bayes classifier predicts:No Flu.