Example 1

EDA in Python — Practical Example Using Small Dataset

Let us understand EDA in Python using a small employee dataset. This example demonstrates how to analyze and visualize data step by step using Pandas, Matplotlib, and Seaborn.

Step 1: Import Required Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Create Sample Dataset

data = {
    "Name": ["John", "Alex", "Sam", "Ravi", "Anu"],
    "Age": [22, 25, 30, 35, 28],
    "Salary": [30000, 40000, 50000, 60000, 45000],
    "Experience": [1, 2, 5, 7, 3],
    "Department": ["HR", "IT", "IT", "Finance", "HR"]
}

df = pd.DataFrame(data)

print(df)

  Name  Age  Salary  Experience Department
0  John   22   30000           1         HR
1  Alex   25   40000           2         IT
2   Sam   30   50000           5         IT
3  Ravi   35   60000           7    Finance
4   Anu   28   45000           3         HR

Step 3: Understanding the Dataset

View First Rows

print(df.head())

Dataset Shape

print(df.shape)

Output

(5, 5)

Dataset Information

print(df.info())

Statistical Summary

print(df.describe())

Step 4: Checking Missing Values

print(df.isnull().sum())

Output

Name          0
Age           0
Salary        0
Experience    0
Department    0

Step 5: Histogram

Histogram is used to visualize data distribution.

Example — Age Distribution

plt.hist(df["Age"])

plt.xlabel("Age")
plt.ylabel("Frequency")
plt.title("Age Distribution")

plt.show()

Step 6: Scatter Plot

Scatter plots are used to analyze relationships between two numerical variables.

Example — Experience vs Salary

plt.scatter(df["Experience"], df["Salary"])

plt.xlabel("Experience")
plt.ylabel("Salary")
plt.title("Experience vs Salary")

plt.show()

Step 7: Box Plot

Box plots help visualize data spread and outliers.

Example — Salary Distribution

sns.boxplot(x=df["Salary"])

plt.title("Salary Box Plot")

plt.show()

Step 8: Count Plot

Count plots display the frequency of categorical values.

Example — Department Count

sns.countplot(x=df["Department"])

plt.title("Department Count")

plt.show()

Step 9: Correlation Analysis

Correlation measures relationships between numerical features.

Example

print(df.corr(numeric_only=True))

Heatmap Visualization

sns.heatmap(df.corr(numeric_only=True),
            annot=True,
            cmap="Blues")

plt.title("Correlation Heatmap")

plt.show()

Complete Program

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample Dataset
data = {
    "Name": ["John", "Alex", "Sam", "Ravi", "Anu"],
    "Age": [22, 25, 30, 35, 28],
    "Salary": [30000, 40000, 50000, 60000, 45000],
    "Experience": [1, 2, 5, 7, 3],
    "Department": ["HR", "IT", "IT", "Finance", "HR"]
}

df = pd.DataFrame(data)

# Dataset Information
print(df.head())
print(df.info())
print(df.describe())

# Missing Values
print(df.isnull().sum())

# Histogram
plt.hist(df["Age"])
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()

# Scatter Plot
plt.scatter(df["Experience"], df["Salary"])
plt.title("Experience vs Salary")
plt.xlabel("Experience")
plt.ylabel("Salary")
plt.show()

# Box Plot
sns.boxplot(x=df["Salary"])
plt.title("Salary Box Plot")
plt.show()

# Count Plot
sns.countplot(x=df["Department"])
plt.title("Department Count")
plt.show()

# Correlation Heatmap
sns.heatmap(df.corr(numeric_only=True),
            annot=True,
            cmap="Blues")

plt.title("Correlation Heatmap")

plt.show()

Summary

In this example, we performed EDA using Python libraries such as Pandas, Matplotlib, and Seaborn. The dataset was analyzed using statistical summaries and visualizations including histograms, scatter plots, box plots, count plots, and heatmaps to better understand the structure and relationships within the data.

Keywords

EDA Practical Example, EDA using Python, Exploratory Data Analysis Example, Python EDA Example, Pandas EDA Tutorial, Seaborn Visualization Example, Matplotlib EDA, Histogram in Python, Scatter Plot Example, Box Plot Example, Correlation Heatmap, Data Visualization using Python, Employee Dataset EDA, Python Data Analysis Example