Wednesday, December 17, 2025

πŸ“Š Types of Data & Data Collection Methods in Data Science (Part 2)

Understanding data is the first and most important step in data science.

Before analysis, modeling, or machine learning, a data scientist must know what type of data they are working with and how it was collected.

In Part 2 of our Statistics for Data Science series, you’ll learn:

  • Different types of data used in data science
  • How data is collected in real-world projects
  • Sampling methods and their importance
  • Hands-on examples to classify datasets

Data types in data science


🎯 Goal of This Post

Understand your data before analyzing it.

Incorrect data understanding leads to:

  • Wrong statistical methods
  • Poor model performance
  • Misleading insights


πŸ“Œ Types of Data in Data Science

Data can be classified in multiple ways depending on its nature and usage.


πŸ”Ή Qualitative vs Quantitative Data

πŸ“˜ Qualitative Data (Categorical Data)

Qualitative data describes qualities or characteristics and is non-numeric.

Examples:

  • Gender (Male/Female)
  • Product category
  • Customer feedback (Good, Bad, Average)
  • City names

πŸ“Œ Used for:

  • Classification
  • Sentiment analysis
  • Grouping and segmentation


πŸ“— Quantitative Data (Numerical Data)

Quantitative data represents numbers and measurable values.

Examples:

  • Age
  • Salary
  • Temperature
  • Number of purchases

πŸ“Œ Used for:

  • Statistical calculations
  • Regression models
  • Forecasting


πŸ”Ή Discrete vs Continuous Data

πŸ“˜ Discrete Data

Discrete data consists of countable values.

Examples:

  • Number of customers
  • Number of defects
  • Number of website visits

πŸ“Œ Values are whole numbers.


πŸ“— Continuous Data

Continuous data can take any value within a range.

Examples:

  • Height
  • Weight
  • Time
  • Temperature

πŸ“Œ Can have decimal values.


πŸ”Ή Structured vs Unstructured Data

πŸ“˜ Structured Data

Structured data is organized in rows and columns.

Examples:

  • Excel files
  • SQL tables
  • CSV datasets

πŸ“Œ Easy to analyze using SQL, Excel, Python, or BI tools.


πŸ“— Unstructured Data

Unstructured data has no predefined format.

Examples:

  • Text documents
  • Emails
  • Images
  • Videos

  • Social media posts

πŸ“Œ Requires advanced processing (NLP, Computer Vision).


πŸ“Œ Data Collection Methods in Data Science

Understanding how data is collected helps assess data quality and bias.


πŸ”Ή Common Data Collection Techniques

1️⃣ Surveys & Questionnaires

  • Online forms
  • Feedback surveys
  • Market research

πŸ“Œ Risk: Response bias


2️⃣ Observational Data

  • Website click tracking
  • User behavior logs
  • Sensor data

πŸ“Œ Real-time and unbiased


3️⃣ Experiments (A/B Testing)

  • Marketing experiments
  • Product feature testing

πŸ“Œ Controlled and reliable


4️⃣ Transactional Data

  • Sales records
  • Banking transactions
  • E-commerce logs

πŸ“Œ Highly structured and reliable


5️⃣ Third-Party Data

  • Government datasets
  • APIs
  • External vendors

πŸ“Œ Verify credibility and freshness


πŸ“Œ Sampling Methods in Statistics

Sampling allows us to study a subset of data instead of the entire population.


πŸ”Ή Types of Sampling Methods

πŸ“˜ Random Sampling

  • Every unit has equal chance
  • Reduces bias


πŸ“˜ Stratified Sampling

  • Population divided into groups (strata)
  • Sample taken from each group

πŸ“Œ Used in surveys and finance


πŸ“˜ Systematic Sampling

  • Every nth observation selected

πŸ“Œ Simple and efficient


πŸ“˜ Convenience Sampling

  • Easily available data

πŸ“Œ Risk: High bias


πŸ“Œ Why Sampling Matters in Data Science

  • Saves time and cost
  • Makes large datasets manageable
  • Enables faster experimentation
  • Supports inferential statistics


πŸ§ͺ Hands-On: Classify Sample Datasets

Let’s classify real-world datasets.

DatasetQualitative / QuantitativeDiscrete / ContinuousStructured / Unstructured
Customer GenderQualitativeDiscreteStructured
Monthly SalaryQuantitativeContinuousStructured
Product ReviewsQualitativeN/AUnstructured
Number of OrdersQuantitativeDiscreteStructured
Website Session TimeQuantitativeContinuousStructured

🧠 Key Takeaways

βœ” Always identify data type before analysis
βœ” Choose statistical methods based on data nature
βœ” Understand data collection to avoid bias
βœ” Sampling impacts accuracy and conclusions


πŸ”— What’s Next in This Series?

πŸ‘‰ Part 3: Descriptive Statistics – Mean, Median, Mode & Variability

Labels: , , , , ,

0 Comments:

Post a Comment

If you have any doubt please comment or write us to - datahark12@gmail.com

Subscribe to Post Comments [Atom]

<< Home