Multiple Linear Regression: Code

Multiple Linear Regression — House Price Prediction Project

In this tutorial, we will build a Multiple Linear Regression model to predict house prices using multiple input features.

The model will learn the relationship between:

house area,
number of bedrooms,
age of the house,
and the final house price.

Unlike Simple Linear Regression, which uses only one input feature, Multiple Linear Regression uses multiple features to improve prediction accuracy.

In this tutorial, we will:

create a small dataset,
visualize the data,
train the regression model,
understand coefficients and intercept,
evaluate the model,
and predict prices for new houses.

This tutorial is beginner-friendly and helps build a strong foundation in Machine Learning regression models.

Step 1: Import Required Libraries

import pandas as pd
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In this step, we import all the required libraries for our Multiple Linear Regression project.

pandas is used to create and manage datasets.
matplotlib.pyplot is used for plotting graphs and visualizing relationships between variables.
LinearRegression is the machine learning algorithm used to train the regression model.
The metrics functions help evaluate the model performance.

Importing libraries is always the first step in most machine learning projects.

Step 2: Create the Dataset

data = {
    'Area': [1000, 1200, 1500, 1800, 2000, 2200, 2500, 2800],
    'Bedrooms': [2, 2, 3, 3, 4, 4, 4, 5],
    'Age': [5, 4, 3, 2, 2, 1, 1, 1],
    'Price': [500000, 580000, 720000, 850000, 950000, 1050000, 1200000, 1350000]
}

df = pd.DataFrame(data)

df

Here we create a simple house price dataset manually.

The dataset contains:

Area → size of the house
Bedrooms → number of bedrooms
Age → age of the house
Price → final house price

Each row represents one house.

We then convert the dictionary into a pandas DataFrame so that we can easily work with the data.

Step 3: Check Dataset Information

df.info()

The info() function provides a quick overview of the dataset.

It shows:

total number of rows
column names
data types
missing values

This step helps verify whether the dataset is clean before training the model.

Step 4: Display First Rows

df.head()

The head() function displays the first five rows of the dataset.

This helps us:

understand the structure of the data
verify column names
inspect sample values

It is one of the most commonly used functions during data analysis.

Step 5: Check Missing Values

df.isnull().sum()

This step checks whether the dataset contains missing values.

Missing values can negatively affect machine learning models, so it is important to identify them before training.

The output shows the number of missing values in each column.

Step 6: Visualize Area vs Price

plt.scatter(df['Area'], df['Price'])

plt.xlabel('Area')
plt.ylabel('Price')
plt.title('Area vs Price')

plt.show()

This scatter plot visualizes the relationship between house area and house price.

From the graph, we can observe that:

larger houses generally have higher prices
there is a positive relationship between area and price

Visualization helps us better understand the data before model training.

Step 7: Visualize Bedrooms vs Price

plt.scatter(df['Bedrooms'], df['Price'])

plt.xlabel('Bedrooms')
plt.ylabel('Price')
plt.title('Bedrooms vs Price')

plt.show()

This graph shows how the number of bedrooms affects house prices.

We can observe that houses with more bedrooms usually have higher prices.

This indicates that bedrooms are an important feature for prediction.

Step 8: Visualize Age vs Price

plt.scatter(df['Age'], df['Price'])

plt.xlabel('Age')
plt.ylabel('Price')
plt.title('Age vs Price')

plt.show()

This scatter plot shows the relationship between house age and house price.

In many cases:

newer houses have higher prices
older houses may have lower prices

This feature can influence the final prediction.

Step 9: Select Features and Target Variable

X = df[['Area', 'Bedrooms', 'Age']]
y = df['Price']

In machine learning:

input variables are called features
output variable is called the target

Here:

X contains the input features:
- Area
- Bedrooms
- Age
y contains the target variable:
- Price

The model will learn how these features affect house prices.

Step 10: Create the Linear Regression Model

model = LinearRegression()

In this step, we create the Multiple Linear Regression model.

At this point:

the model is empty
no learning has happened yet

The model will learn patterns only after training.

Step 11: Train the Model

model.fit(X, y)

The fit() function trains the model using the dataset.

During training, the model learns:

how each feature affects house price
the relationship between inputs and outputs

This is the most important step in machine learning.

Step 12: Print Model Coefficients

print(model.coef_)

Coefficients represent the importance of each feature.

Each coefficient tells us:

how much the price changes
when that feature increases by one unit

Larger coefficient values indicate stronger influence on prediction.

Step 13: Print Intercept

print(model.intercept_)

The intercept is the starting value of the regression equation.

It is the predicted value when all input features are zero.

The intercept is required to form the final regression equation.

Step 14: Display Feature Coefficients Clearly

coefficients = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': model.coef_
})

coefficients

This step creates a table containing:

feature names
their coefficient values

This makes the model easier to interpret and understand.

Step 15: Plot Feature Importance

plt.bar(coefficients['Feature'], coefficients['Coefficient'])

plt.xlabel('Features')
plt.ylabel('Coefficient Value')
plt.title('Feature Importance')

plt.show()

This bar chart visualizes feature importance.

The graph helps us quickly identify:

which features affect price the most
which features have smaller impact

Visualization improves model interpretability.

Step 16: Predict House Prices

predictions = model.predict(X)

predictions

The predict() function generates predicted house prices using the trained model.

The model uses learned patterns to estimate prices from input features.

Step 17: Compare Actual and Predicted Values

comparison = pd.DataFrame({
    'Actual Price': y,
    'Predicted Price': predictions
})

comparison

This table compares:

actual house prices
predicted house prices

Comparing both values helps evaluate how accurately the model performs.

Step 18: Plot Actual vs Predicted Prices

plt.scatter(y, predictions)

plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices')

plt.show()

This graph compares actual prices with predicted prices.

If predictions are accurate:

points appear close together
predictions follow a consistent pattern

This visualization helps assess model quality.

Step 19: Calculate Residual Errors

residuals = y - predictions

residuals

Residuals represent prediction errors.

Formula:

Residual = Actual Value - Predicted Value

Smaller residual values indicate better predictions.

Step 20: Plot Residual Errors

plt.scatter(predictions, residuals)

plt.xlabel('Predicted Prices')
plt.ylabel('Residual Errors')
plt.title('Residual Plot')

plt.axhline(y=0)

plt.show()

The residual plot visualizes prediction errors.

A good regression model should produce:

small errors
randomly distributed errors

Patterns in residuals may indicate model problems.

Step 21: Evaluate Model Performance

mae = mean_absolute_error(y, predictions)
mse = mean_squared_error(y, predictions)
r2 = r2_score(y, predictions)

print("MAE:", mae)
print("MSE:", mse)
print("R2 Score:", r2)

These metrics evaluate model performance.

MAE measures average prediction error.
MSE penalizes larger errors more strongly.
R2 Score measures how well the model explains the data.

Higher R2 scores usually indicate better models.

Step 22: Predict Price for a New House

new_house = [[2100, 4, 2]]

predicted_price = model.predict(new_house)

print(predicted_price[0])

In this step, we test the model using new house details.

Input values:

Area = 2100
Bedrooms = 4
Age = 2

The model predicts the expected house price based on learned patterns.

Step 23: Print Final Equation Values

print("Intercept:", model.intercept_)

for feature, coef in zip(X.columns, model.coef_):
    print(feature, coef)

This step prints the final learned values of:

intercept
coefficients

These values form the final regression equation used by the model for predictions.

Summary

In this tutorial, we learned how to build a Multiple Linear Regression model using Python and Scikit-Learn to predict house prices based on multiple features such as area, bedrooms, and house age. We created a dataset, visualized relationships between features and price using plots, trained the regression model, understood coefficients and intercept values, generated predictions, evaluated model performance using different metrics, and finally predicted the price of a new house using custom input values. This project helps beginners understand how machine learning models learn patterns from data and make real-world predictions.

Multiple Linear Regression — House Price Prediction Project

Summary

Check your knowledge

Congratulations!