π Types of Data & Data Collection Methods in Data Science (Part 2)
Understanding data is the first and most important step in data science.
Before analysis, modeling, or machine learning, a data scientist must know what type of data they are working with and how it was collected.
In Part 2 of our Statistics for Data Science series, youβll learn:
- Different types of data used in data science
- How data is collected in real-world projects
- Sampling methods and their importance
- Hands-on examples to classify datasets
π― Goal of This Post
Understand your data before analyzing it.
Incorrect data understanding leads to:
- Wrong statistical methods
- Poor model performance
- Misleading insights
π Types of Data in Data Science
Data can be classified in multiple ways depending on its nature and usage.
πΉ Qualitative vs Quantitative Data
π Qualitative Data (Categorical Data)
Qualitative data describes qualities or characteristics and is non-numeric.
Examples:
- Gender (Male/Female)
- Product category
- Customer feedback (Good, Bad, Average)
- City names
π Used for:
- Classification
- Sentiment analysis
- Grouping and segmentation
π Quantitative Data (Numerical Data)
Quantitative data represents numbers and measurable values.
Examples:
- Age
- Salary
- Temperature
- Number of purchases
π Used for:
- Statistical calculations
- Regression models
- Forecasting
πΉ Discrete vs Continuous Data
π Discrete Data
Discrete data consists of countable values.
Examples:
- Number of customers
- Number of defects
- Number of website visits
π Values are whole numbers.
π Continuous Data
Continuous data can take any value within a range.
Examples:
- Height
- Weight
- Time
- Temperature
π Can have decimal values.
πΉ Structured vs Unstructured Data
π Structured Data
Structured data is organized in rows and columns.
Examples:
- Excel files
- SQL tables
- CSV datasets
π Easy to analyze using SQL, Excel, Python, or BI tools.
π Unstructured Data
Unstructured data has no predefined format.
Examples:
- Text documents
- Emails
- Images
- Videos
-
Social media posts
π Requires advanced processing (NLP, Computer Vision).
π Data Collection Methods in Data Science
Understanding how data is collected helps assess data quality and bias.
πΉ Common Data Collection Techniques
1οΈβ£ Surveys & Questionnaires
- Online forms
- Feedback surveys
- Market research
π Risk: Response bias
2οΈβ£ Observational Data
- Website click tracking
- User behavior logs
- Sensor data
π Real-time and unbiased
3οΈβ£ Experiments (A/B Testing)
- Marketing experiments
- Product feature testing
π Controlled and reliable
4οΈβ£ Transactional Data
- Sales records
- Banking transactions
- E-commerce logs
π Highly structured and reliable
5οΈβ£ Third-Party Data
- Government datasets
- APIs
- External vendors
π Verify credibility and freshness
π Sampling Methods in Statistics
Sampling allows us to study a subset of data instead of the entire population.
πΉ Types of Sampling Methods
π Random Sampling
- Every unit has equal chance
- Reduces bias
π Stratified Sampling
- Population divided into groups (strata)
- Sample taken from each group
π Used in surveys and finance
π Systematic Sampling
- Every nth observation selected
π Simple and efficient
π Convenience Sampling
- Easily available data
π Risk: High bias
π Why Sampling Matters in Data Science
- Saves time and cost
- Makes large datasets manageable
- Enables faster experimentation
- Supports inferential statistics
π§ͺ Hands-On: Classify Sample Datasets
Letβs classify real-world datasets.
| Dataset | Qualitative / Quantitative | Discrete / Continuous | Structured / Unstructured |
|---|---|---|---|
| Customer Gender | Qualitative | Discrete | Structured |
| Monthly Salary | Quantitative | Continuous | Structured |
| Product Reviews | Qualitative | N/A | Unstructured |
| Number of Orders | Quantitative | Discrete | Structured |
| Website Session Time | Quantitative | Continuous | Structured |
π§ Key Takeaways
β Always identify data type before analysis
β Choose statistical methods based on data nature
β Understand data collection to avoid bias
β Sampling impacts accuracy and conclusions
π Whatβs Next in This Series?
π Part 3: Descriptive Statistics β Mean, Median, Mode & Variability
Labels: Data Collection Methods, Qualitative vs Quantitative Data, Sampling Methods, Statistics, Statistics for Data Science, Types of Data in Data Science


0 Comments:
Post a Comment
If you have any doubt please comment or write us to - datahark12@gmail.com
Subscribe to Post Comments [Atom]
<< Home