Wednesday, December 17, 2025

๐Ÿ“Š Descriptive Statistics Explained Simply | Mean, Median & Std Dev

Part 3: Statistics for Data Science Series

Goal: Learn how to summarize data effectively before analysis

When working with data, the first and most important question is:

๐Ÿ‘‰ โ€œWhat does this data look like?โ€

This is where Descriptive Statistics comes in.

Descriptive statistics help us summarize, understand, and interpret data using simple numerical measures. Before any machine learning, prediction, or dashboarding โ€” descriptive stats are your foundation.

โ€œDescriptive Statistics Explained Simplyโ€ on datahark.in


๐Ÿ“Œ What is Descriptive Statistics?

Descriptive statistics are techniques used to summarize and describe the main features of a dataset.

They help answer questions like:

  • What is the average value?
  • How spread out is the data?
  • Are there extreme values?
  • Where does most of the data lie?

๐Ÿ‘‰ Unlike inferential statistics, descriptive statistics do not make predictions โ€” they explain what already exists.


๐Ÿ“ˆ Measures of Central Tendency (Finding the โ€œCenterโ€)

1๏ธโƒฃ Mean (Average)

Definition:
The sum of all values divided by the total number of values.

Formula:
Mean = (Sum of values) / (Number of values)

๐Ÿ“Œ Real-Life Example:
Average daily sales of an e-commerce store over 5 days:
โ‚น10k, โ‚น12k, โ‚น11k, โ‚น13k, โ‚น14k

Mean = โ‚น12k

๐ŸŸข Best used when data has no extreme outliers


2๏ธโƒฃ Median (Middle Value)

Definition:
The middle value when data is sorted.

๐Ÿ“Œ Example:
Monthly salaries in a startup:
โ‚น25k, โ‚น30k, โ‚น35k, โ‚น40k, โ‚น5,00,000

Median = โ‚น35k

๐ŸŸข Best used when data contains outliers


3๏ธโƒฃ Mode (Most Frequent Value)

Definition:
The value that appears most often.

๐Ÿ“Œ Example:
Most sold product sizes:
M, M, L, S, M, L

Mode = M

๐ŸŸข Useful for categorical data


๐Ÿ“Š Measures of Dispersion (Understanding Spread)

4๏ธโƒฃ Range

Definition:
Difference between the maximum and minimum values.

๐Ÿ“Œ Example:
City temperature range:
Min = 20ยฐC, Max = 45ยฐC

Range = 25ยฐC

โš ๏ธ Highly sensitive to outliers


5๏ธโƒฃ Variance

Definition:
Average of squared differences from the mean.

  • Low variance โ†’ values are close together
  • High variance โ†’ values are spread out

๐ŸŸข Widely used in finance and risk analysis


6๏ธโƒฃ Standard Deviation

Definition:
Square root of variance.

๐Ÿ“Œ Example:
Stocks with high standard deviation are more volatile.

๐ŸŸข Most important dispersion metric
๐ŸŸข Same unit as original data


๐Ÿ“ Percentiles & Interquartile Range (IQR)

7๏ธโƒฃ Percentiles

Definition:
A percentile shows the value below which a certain percentage of data falls.

๐Ÿ“Œ Example:
90th percentile salary = โ‚น20 LPA
You earn more than 90% of employees.


8๏ธโƒฃ Interquartile Range (IQR)

Formula:
IQR = Q3 โˆ’ Q1

Why it matters:

  • Identifies outliers
  • Used in box plots
  • Robust against extreme values

๐Ÿ“Œ Real Scenario:
Detecting abnormal insurance claims or fraud transactions.


๐Ÿง  Summary Statistics in Python (Hands-On)

Pandas is the most widely used Python library for descriptive statistics.

๐Ÿ”น Sample Dataset

import pandas as pd

data = {
    "Sales": [12000, 15000, 10000, 18000, 16000],
    "Profit": [2000, 3000, 1500, 4000, 3500]
}

df = pd.DataFrame(data)

๐Ÿ”น Using Pandas .describe()

df.describe()

.describe() instantly provides:

  • Count
  • Mean
  • Standard Deviation
  • Minimum & Maximum
  • 25%, 50% (Median), 75% percentiles

๐ŸŸข Used in almost every real-world data analysis project


๐Ÿ”น Individual Statistics in Pandas

df.mean()
df.median()
df.std()
df.var()
df.quantile(0.75)

๐Ÿข Real-World Applications

๐Ÿ“Œ Business

  • Average revenue per customer
  • Monthly sales analysis
  • Customer behavior tracking

๐Ÿ“Œ Finance

  • Stock volatility measurement
  • Risk evaluation
  • Portfolio performance

๐Ÿ“Œ Healthcare

  • Patient recovery analysis
  • Hospital stay durations
  • Disease statistics

๐Ÿš€ Key Takeaways

  • Descriptive statistics summarize data
  • Mean, median, mode explain central tendency
  • Standard deviation explains variability
  • Percentiles & IQR handle outliers
  • Pandas .describe() is essential for EDA

๐Ÿ”œ Whatโ€™s Next?

Part 4: Data Visualization for Statistics

  • Histograms
  • Box plots
  • Bar charts
  • Python visualizations

Statistics isnโ€™t hard โ€” itโ€™s just misunderstood. Keep learning! ๐Ÿš€

โ“ Frequently Asked Questions (FAQ)

What is descriptive statistics?

Descriptive statistics is a branch of statistics that summarizes and describes the main characteristics of a dataset using measures like mean, median, mode, variance, and standard deviation.

Why is descriptive statistics important in data science?

Descriptive statistics helps data scientists understand data distribution, identify patterns, detect outliers, and prepare datasets for further analysis and machine learning models.

What is the difference between mean and median?

The mean is the average of all values, while the median is the middle value when data is sorted. Median is more reliable when the dataset contains outliers.

When should I use standard deviation?

Standard deviation is used to measure how spread out values are from the mean. It is commonly used in finance, business analytics, and risk assessment.

What is Pandas describe() used for?

The describe() function in Pandas provides a quick summary of key descriptive statistics including count, mean, standard deviation, minimum, maximum, and percentiles.

Is descriptive statistics enough for data analysis?

Descriptive statistics is the first step in data analysis. For predictions and conclusions about future data, inferential statistics and machine learning techniques are required.

Labels: , , ,

0 Comments:

Post a Comment

If you have any doubt please comment or write us to - datahark12@gmail.com

Subscribe to Post Comments [Atom]

<< Home