Classification
Classification is one of the most important supervised learning techniques in Machine Learning where the goal is to predict the category (class label) of a given input data.
In simple words:
Classification means assigning data into predefined classes or groups.
For example:
-
Email → Spam or Not Spam
-
Student result → Pass or Fail
-
Tumor → Benign or Malignant
-
Transaction → Fraud or Genuine
What is Supervised Learning?
In supervised learning:
-
We already know the correct outputs (labels)
-
The model learns from labeled training data
-
Then it predicts labels for new unseen data
Example dataset:
| Study Hours | Result |
|---|---|
| 2 | Fail |
| 4 | Pass |
| 6 | Pass |
The algorithm learns the relationship between study hours and result.
Types of Classification
1. Binary Classification
Only two classes are present.
Examples:
-
Yes / No
-
True / False
-
Spam / Not Spam
Algorithms:
-
Logistic Regression
-
SVM
-
Naive Bayes
2. Multi-Class Classification
More than two classes are present.
Examples:
-
Digit recognition (0–9)
-
Animal classification
-
Disease prediction
Algorithms:
-
Decision Tree
-
Random Forest
-
KNN
-
Neural Networks
3. Multi-Label Classification
One data point can belong to multiple classes simultaneously.
Example:
A movie can be:
-
Action
-
Comedy
-
Thriller
all at the same time.
How Classification Works
Basic steps:
-
Collect dataset
-
Split into training and testing data
-
Train the classification algorithm
-
Learn patterns from data
-
Predict labels for new data
-
Evaluate performance
General Workflow of Classification
Input Data
↓
Training Algorithm
↓
Model Learning
↓
Prediction
↓
Class Label Output
Important Terminologies
1. Feature
Input variables used for prediction.
Example:
-
Age
-
Salary
-
Height
2. Label / Target
Output category.
Example:
-
Pass/Fail
-
Spam/Not Spam
3. Training Data
Data used to train the model.
4. Testing Data
Data used to evaluate the model.
Classification Algorithms
Classification algorithms are mainly divided into:
A) Linear Classifiers
These separate classes using a straight line (or hyperplane).
Examples:
-
Logistic Regression
-
Linear SVM
-
Perceptron
-
SGD Classifier
B) Non-Linear Classifiers
These can create curved or complex boundaries.
Examples:
-
KNN
-
Decision Tree
-
Kernel SVM
-
Naive Bayes
C) Ensemble Classifiers
These combine multiple models to improve accuracy.
Examples:
-
Random Forest
-
AdaBoost
-
Bagging
-
Voting Classifier
Decision Boundary
A decision boundary is the line or curve that separates different classes.
Example:
Class A | Boundary | Class B
Linear classifiers create straight boundaries.
Non-linear classifiers create curved boundaries.
Evaluation Metrics for Classification
Common metrics:
| Metric | Meaning |
|---|---|
| Accuracy | Correct predictions |
| Precision | Correct positive predictions |
| Recall | Ability to find positives |
| F1-Score | Balance of precision & recall |
| Confusion Matrix | Detailed prediction analysis |
Real-Life Applications of Classification
Healthcare
-
Disease prediction
-
Cancer detection
Finance
-
Fraud detection
-
Credit scoring
Email Systems
-
Spam filtering
Social Media
-
Sentiment analysis
E-commerce
-
Product recommendation
Advantages of Classification
-
Easy prediction of categories
-
Works well in many real-world problems
-
High accuracy with proper data
-
Supports automation
Challenges in Classification
-
Overfitting
-
Imbalanced data
-
Noisy datasets
-
Feature selection issues
Summary
Classification is a supervised ML technique used to predict categorical outputs. It is widely used in healthcare, finance, security, and recommendation systems. Different classifiers are chosen depending on whether the data is linear, non-linear, or requires ensemble learning.
Keywords
Classification in Machine Learning, Supervised Learning, Binary Classification, Multi Class Classification, Linear Classifiers, Non Linear Classifiers, Ensemble Classifiers, Decision Boundary, Classification Algorithms, Model Evaluation