Example - MLR

Multiple Linear Regression — Complete Mathematical Example

Problem Statement

We want to predict y using two input variables:

X1 and X2

The Multiple Linear Regression equation is:

ŷ = b0 + b1X1 + b2X2

Where:

ŷ  = Predicted value
b0 = Intercept
b1 = Coefficient of X1
b2 = Coefficient of X2

Dataset

y X1 X2
140 60 22
155 62 25
159 67 24
179 70 20
192 71 15
200 72 14
212 75 14
215 78 11

Step 1: Create Additional Columns

We need to calculate:

X1², X2², X1Y, X2Y, X1X2
y X1 X2 X1² X2² X1Y X2Y X1X2
140 60 22 3600 484 8400 3080 1320
155 62 25 3844 625 9610 3875 1550
159 67 24 4489 576 10653 3816 1608
179 70 20 4900 400 12530 3580 1400
192 71 15 5041 225 13632 2880 1065
200 72 14 5184 196 14400 2800 1008
212 75 14 5625 196 15900 2968 1050
215 78 11 6084 121 16770 2365 858

Step 2: Calculate Column Sums

n = 8
Σy = 140 + 155 + 159 + 179 + 192 + 200 + 212 + 215
Σy = 1452
ΣX1 = 60 + 62 + 67 + 70 + 71 + 72 + 75 + 78
ΣX1 = 555
ΣX2 = 22 + 25 + 24 + 20 + 15 + 14 + 14 + 11
ΣX2 = 145
ΣX1² = 3600 + 3844 + 4489 + 4900 + 5041 + 5184 + 5625 + 6084
ΣX1² = 38767
ΣX2² = 484 + 625 + 576 + 400 + 225 + 196 + 196 + 121
ΣX2² = 2823
ΣX1Y = 8400 + 9610 + 10653 + 12530 + 13632 + 14400 + 15900 + 16770
ΣX1Y = 101895
ΣX2Y = 3080 + 3875 + 3816 + 3580 + 2880 + 2800 + 2968 + 2365
ΣX2Y = 25364
ΣX1X2 = 1320 + 1550 + 1608 + 1400 + 1065 + 1008 + 1050 + 858
ΣX1X2 = 9859

Step 3: Calculate Means

mean_y = Σy / n
mean_y = 1452 / 8
mean_y = 181.5
mean_X1 = ΣX1 / n
mean_X1 = 555 / 8
mean_X1 = 69.375
mean_X2 = ΣX2 / n
mean_X2 = 145 / 8
mean_X2 = 18.125

Step 4: Calculate Regression Sums

Regression sums help measure the variation and relationship between variables after removing the effect of means.

We need:

Σx1², Σx2², Σx1y, Σx2y, Σx1x2

Calculate Σx1²

Formula:

Σx1² = ΣX1² - ((ΣX1)² / n)

Substitute values:

Σx1² = 38767 - ((555)² / 8)
Σx1² = 38767 - (308025 / 8)
Σx1² = 38767 - 38503.125
Σx1² = 263.875

Calculate Σx2²

Formula:

Σx2² = ΣX2² - ((ΣX2)² / n)

Substitute values:

Σx2² = 2823 - ((145)² / 8)
Σx2² = 2823 - (21025 / 8)
Σx2² = 2823 - 2628.125
Σx2² = 194.875

Calculate Σx1y

Formula:

Σx1y = ΣX1Y - ((ΣX1)(Σy) / n)

Substitute values:

Σx1y = 101895 - ((555)(1452) / 8)
Σx1y = 101895 - (805860 / 8)
Σx1y = 101895 - 100732.5
Σx1y = 1162.5

Calculate Σx2y

Formula:

Σx2y = ΣX2Y - ((ΣX2)(Σy) / n)

Substitute values:

Σx2y = 25364 - ((145)(1452) / 8)
Σx2y = 25364 - (210540 / 8)
Σx2y = 25364 - 26317.5
Σx2y = -953.5

Calculate Σx1x2

Formula:

Σx1x2 = ΣX1X2 - ((ΣX1)(ΣX2) / n)

Substitute values:

Σx1x2 = 9859 - ((555)(145) / 8)
Σx1x2 = 9859 - (80475 / 8)
Σx1x2 = 9859 - 10059.375
Σx1x2 = -200.375

Step 5: Calculate b1

Formula:

b1 = (((Σx2²)(Σx1y)) - ((Σx1x2)(Σx2y))) / (((Σx1²)(Σx2²)) - ((Σx1x2)²))

Substitute values:

b1 = (((194.875)(1162.5)) - ((-200.375)(-953.5))) / (((263.875)(194.875)) - ((-200.375)²))

Calculate numerator:

(194.875)(1162.5) = 226545.9375
(-200.375)(-953.5) = 191063.5625
Numerator = 226545.9375 - 191063.5625
Numerator = 35482.375

Calculate denominator:

(263.875)(194.875) = 51422.640625
(-200.375)² = 40150.140625
Denominator = 51422.640625 - 40150.140625
Denominator = 11272.5

Now calculate b1:

b1 = 35482.375 / 11272.5
b1 = 3.148

Step 6: Calculate b2

Formula:

b2 = (((Σx1²)(Σx2y)) - ((Σx1x2)(Σx1y))) / (((Σx1²)(Σx2²)) - ((Σx1x2)²))

Substitute values:

b2 = (((263.875)(-953.5)) - ((-200.375)(1162.5))) / (((263.875)(194.875)) - ((-200.375)²))

Calculate numerator:

(263.875)(-953.5) = -251605.8125
(-200.375)(1162.5) = -232935.9375
Numerator = -251605.8125 - (-232935.9375)
Numerator = -18669.875

Denominator is the same as before:

Denominator = 11272.5

Now calculate b2:

b2 = -18669.875 / 11272.5
b2 = -1.656

Step 7: Calculate b0

Formula:

b0 = mean_y - (b1)(mean_X1) - (b2)(mean_X2)

Substitute values:

b0 = 181.5 - (3.148)(69.375) - (-1.656)(18.125)

Calculate:

(3.148)(69.375) = 218.3475
(-1.656)(18.125) = -30.015

So:

b0 = 181.5 - 218.3475 - (-30.015)
b0 = 181.5 - 218.3475 + 30.015
b0 = -6.8325

Using more exact coefficient values, b0 is approximately:

b0 = -6.867

Step 8: Final Regression Equation

ŷ = b0 + b1X1 + b2X2
ŷ = -6.867 + 3.148X1 - 1.656X2

Step 9: Prediction Example

Suppose:

X1 = 70
X2 = 20

Use the equation:

ŷ = -6.867 + 3.148X1 - 1.656X2

Substitute values:

ŷ = -6.867 + (3.148)(70) - (1.656)(20)
ŷ = -6.867 + 220.36 - 33.12
ŷ = 180.373

Final Prediction

Predicted y = 180.373

or approximately:

Predicted y ≈ 180.37

Interpretation

b1 = 3.148

For every 1 unit increase in X1, y increases by 3.148 units on average, assuming X2 remains constant.

b2 = -1.656

For every 1 unit increase in X2, y decreases by 1.656 units on average, assuming X1 remains constant.

b0 = -6.867

When X1 and X2 are both zero, the predicted value of y is -6.867.

Formulas

Python Code

import pandas as pd
from sklearn.linear_model import LinearRegression

# Dataset
data = {
    "y":  [140, 155, 159, 179, 192, 200, 212, 215],
    "X1": [60, 62, 67, 70, 71, 72, 75, 78],
    "X2": [22, 25, 24, 20, 15, 14, 14, 11]
}

# Create DataFrame
df = pd.DataFrame(data)

# Independent variables
X = df[["X1", "X2"]]

# Dependent variable
y = df["y"]

# Create model
model = LinearRegression()

# Train model
model.fit(X, y)

# Print coefficients
print("Scikit-Learn Results:")
print("Intercept b0:", model.intercept_)
print("Coefficient b1 for X1:", model.coef_[0])
print("Coefficient b2 for X2:", model.coef_[1])

# Prediction
prediction = model.predict([[70, 20]])

print("\nScikit-Learn Prediction:")
print("Predicted y =", prediction[0])

Expected Values

Scikit-Learn Results: 
Intercept b0: -6.867487247726785 
Coefficient b1 for X1: 3.147893102683522 
Coefficient b2 for X2: -1.6561432690175197 
Scikit-Learn Prediction: 
Predicted y = 180.36216455976935

Full Code

import pandas as pd
from sklearn.linear_model import LinearRegression

# Step 1: Create the dataset
data = {
    "y":  [140, 155, 159, 179, 192, 200, 212, 215],
    "X1": [60, 62, 67, 70, 71, 72, 75, 78],
    "X2": [22, 25, 24, 20, 15, 14, 14, 11]
}

df = pd.DataFrame(data)

print("Dataset:")
print(df)

# Step 2: Create additional columns
df["X1_square"] = df["X1"] ** 2
df["X2_square"] = df["X2"] ** 2
df["X1_y"] = df["X1"] * df["y"]
df["X2_y"] = df["X2"] * df["y"]
df["X1_X2"] = df["X1"] * df["X2"]

print("\nDataset with additional columns:")
print(df)

# Step 3: Calculate sums
n = len(df)

sum_y = df["y"].sum()
sum_X1 = df["X1"].sum()
sum_X2 = df["X2"].sum()

sum_X1_square = df["X1_square"].sum()
sum_X2_square = df["X2_square"].sum()

sum_X1_y = df["X1_y"].sum()
sum_X2_y = df["X2_y"].sum()
sum_X1_X2 = df["X1_X2"].sum()

print("\nSums:")
print("n =", n)
print("Σy =", sum_y)
print("ΣX1 =", sum_X1)
print("ΣX2 =", sum_X2)
print("ΣX1² =", sum_X1_square)
print("ΣX2² =", sum_X2_square)
print("ΣX1Y =", sum_X1_y)
print("ΣX2Y =", sum_X2_y)
print("ΣX1X2 =", sum_X1_X2)

# Step 4: Calculate means
mean_y = sum_y / n
mean_X1 = sum_X1 / n
mean_X2 = sum_X2 / n

print("\nMeans:")
print("mean_y =", mean_y)
print("mean_X1 =", mean_X1)
print("mean_X2 =", mean_X2)

# Step 5: Calculate regression sums
reg_x1_square = sum_X1_square - ((sum_X1 ** 2) / n)
reg_x2_square = sum_X2_square - ((sum_X2 ** 2) / n)

reg_x1_y = sum_X1_y - ((sum_X1 * sum_y) / n)
reg_x2_y = sum_X2_y - ((sum_X2 * sum_y) / n)

reg_x1_x2 = sum_X1_X2 - ((sum_X1 * sum_X2) / n)

print("\nRegression Sums:")
print("Σx1² =", reg_x1_square)
print("Σx2² =", reg_x2_square)
print("Σx1y =", reg_x1_y)
print("Σx2y =", reg_x2_y)
print("Σx1x2 =", reg_x1_x2)

# Step 6: Calculate b1 and b2 manually
denominator = (reg_x1_square * reg_x2_square) - (reg_x1_x2 ** 2)

b1 = (
    (reg_x2_square * reg_x1_y) -
    (reg_x1_x2 * reg_x2_y)
) / denominator

b2 = (
    (reg_x1_square * reg_x2_y) -
    (reg_x1_x2 * reg_x1_y)
) / denominator

# Step 7: Calculate b0
b0 = mean_y - (b1 * mean_X1) - (b2 * mean_X2)

print("\nManual Calculation Results:")
print("b0 =", b0)
print("b1 =", b1)
print("b2 =", b2)

print("\nManual Regression Equation:")
print(f"y = {b0:.3f} + ({b1:.3f})X1 + ({b2:.3f})X2")

# Step 8: Prediction manually
X1_new = 70
X2_new = 20

manual_prediction = b0 + (b1 * X1_new) + (b2 * X2_new)

print("\nManual Prediction:")
print("For X1 = 70 and X2 = 20")
print("Predicted y =", manual_prediction)

# Step 9: Verify using Scikit-Learn
X = df[["X1", "X2"]]
y = df["y"]

model = LinearRegression()
model.fit(X, y)

print("\nScikit-Learn Results:")
print("Intercept b0:", model.intercept_)
print("Coefficient b1 for X1:", model.coef_[0])
print("Coefficient b2 for X2:", model.coef_[1])

sklearn_prediction = model.predict([[70, 20]])

print("\nScikit-Learn Prediction:")
print("Predicted y =", sklearn_prediction[0])

Summary

Multiple Linear Regression calculates separate coefficients for each input variable. These coefficients show how each feature affects the target variable while keeping the other feature constant. The final equation can then be used to predict new values.

Keywords

Multiple Linear Regression Example, Multiple Linear Regression Mathematical Problem, Multiple Regression by Hand, Regression Coefficient Calculation, Multiple Linear Regression Formula, Regression Sums, Regression Equation, b0 b1 b2 Calculation, Statistical Regression Example, Multiple Regression Step by Step, Machine Learning Regression Mathematics, Multiple Variable Prediction, Regression Analysis Example

Previous Topic Polynomial Regression Next Topic ML Projects