Simple Linear Regression
Simple Linear Regression is a supervised machine learning algorithm used to predict a continuous numerical value using one independent variable (input feature).
It tries to find the best straight-line relationship between:
- Input Variable (X)
- Output Variable (Y)
Real-Life Example
Suppose we want to predict:
- Student marks based on study hours
- House price based on area
- Salary based on years of experience
Example Dataset
| Study Hours | Marks |
|---|---|
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
| 5 | 50 |
The relationship is:
More study hours → Higher marks
Goal of Simple Linear Regression
The goal is to find the best-fit straight line that predicts output values accurately.
Equation of Simple Linear Regression
y=mx+b
Where:
- → Predicted output
- → Input feature
- → Slope of line
- → Intercept
Understanding the Equation
Slope (m)
Slope shows:
How much y changes when x changes
Intercept (b)
Intercept is:
The value of y when x = 0
Best Fit Line
Simple Linear Regression tries to draw the best straight line through all data points.
Visualization
Data Points → Actual values
Line → Predicted relationship
Practical Example Using Python
Problem Statement
Predict student marks based on study hours.
Step 1: Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
Step 2: Create Dataset
data = {
"Hours": [1, 2, 3, 4, 5, 6, 7, 8],
"Marks": [10, 20, 30, 40, 50, 60, 70, 80]
}
df = pd.DataFrame(data)
print(df)
Output
Hours Marks
0 1 10
1 2 20
2 3 30
...
Step 3: Visualize Dataset
plt.scatter(df["Hours"], df["Marks"])
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.title("Study Hours vs Marks")
plt.show()
Observation
The points follow a straight-line pattern.
Step 4: Define Features and Target
X = df[["Hours"]]
y = df["Marks"]
Where:
- X → Input feature
- y → Target variable
Step 5: Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42
)
Step 6: Train the Model
model = LinearRegression()
model.fit(X_train, y_train)
Step 7: Predict Test Values
predictions = model.predict(X_test)
print(predictions)
Step 8: Predict New Value
new_prediction = model.predict([[9]])
print("Predicted Marks:",
new_prediction[0])
Expected Output
Predicted Marks: 90
Step 9: Visualize Regression Line
plt.scatter(df["Hours"], df["Marks"])
plt.plot(df["Hours"],
model.predict(X),
color="red")
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.title("Simple Linear Regression")
plt.show()
What the Model Learned
The model learned:
Study Hours ↑ → Marks ↑
This relationship is represented using a straight line.
Model Coefficients
Slope
print(model.coef_)
Intercept
print(model.intercept_)
Understanding Predictions
Suppose:
- Slope = 10
- Intercept = 0
Then:
Marks=10 (Hours) + 0
If Hours = 9:
Marks = 10(9) = 90
Assumptions of Linear Regression
1. Linear relationship exists
2. Data has minimal outliers
3. Errors are normally distributed
4. Independent observations
Real-World Applications
| Application | Prediction |
|---|---|
| Education | Student marks |
| Real Estate | House prices |
| Business | Sales forecasting |
| Finance | Profit prediction |
Advantages
- Simple and easy to understand
- Fast training
- Easy visualization
- Works well for linear data
Limitations
- Cannot model complex nonlinear patterns
- Sensitive to outliers
- Assumes linear relationship
Important Interview Points
1. Simple Linear Regression uses one independent variable.
2. It predicts continuous numerical values.
3. The regression line is represented by: y=mx+b
4. Slope represents rate of change.
5. Linear Regression works best when data has a linear relationship.
Summary
Simple Linear Regression is a supervised learning algorithm used to predict continuous numerical values using one input feature. It models the relationship between input and output variables using a straight-line equation and is widely used for prediction and forecasting tasks.
Keywords
Simple Linear Regression, Linear Regression, Simple Linear Regression in Machine Learning, Regression Algorithm, Supervised Learning, Regression Line, Best Fit Line, Predictive Modeling, Continuous Value Prediction, Linear Relationship, Regression Equation, Slope and Intercept, Regression using Python, Scikit Learn Linear Regression, Machine Learning Regression