Tuesday, August 5, 2025

πŸ”„ Mastering IF-THEN-ELSE and DO Loops in SAS – Complete Guide with Examples

πŸ”° Introduction

Conditional logic and looping are core concepts in any programming language. In SAS, IF-THEN-ELSE statements and DO loops provide powerful tools for controlling the flow of your Data Step code. This guide explains how to use these tools effectively with real-world examples.


If else in SAS

🧩 1. IF-THEN-ELSE in SAS

The IF-THEN-ELSE statement allows you to execute specific code based on conditions.

βœ… Syntax:

IF condition THEN action;
ELSE IF condition THEN action; ELSE action;

πŸ“Œ Example 1: Simple IF-THEN-ELSE

data class_flag;
set sashelp.class; if age < 13 then group = 'Child'; else if age < 18 then group = 'Teen'; else group = 'Adult'; run;

Explanation:
Classifies students into Child, Teen, or Adult groups based on age.


πŸ§ͺ More IF-THEN-ELSE Examples in SAS


πŸ“Œ Example 1: Assign Grades Based on Scores

data grades;
input name $ score; if score >= 90 then grade = 'A'; else if score >= 80 then grade = 'B'; else if score >= 70 then grade = 'C'; else if score >= 60 then grade = 'D'; else grade = 'F'; datalines; John 85 Sara 92 Alex 67 Nina 74 Bob 58 ; run;

πŸ“Œ Example 2: Handle Missing Values in Conditions

data test_missing;
input id age; if age = . then status = 'Missing'; else if age < 18 then status = 'Minor'; else status = 'Adult'; datalines; 1 25 2 . 3 17 ; run;

πŸ“Œ Example 3: Create Flags for Categorical Variables

data product_flag;
input product $ category $; if category = 'Electronics' then flag = 1; else flag = 0; datalines; Laptop Electronics Shoes Apparel Phone Electronics Watch Accessories ; run;

πŸ“Œ Example 4: Nested IF-THEN-ELSE Logic

data nested_logic;
input city $ temp; if city = 'Delhi' then do; if temp > 40 then warning = 'Heatwave'; else warning = 'Normal'; end; else warning = 'Check city'; datalines; Delhi 45 Delhi 30 Mumbai 32 ; run;

πŸ“Œ Example 5: Use with Multiple Variables

data risk_check;
input age income; if age < 25 and income < 30000 then risk = 'High'; else if age >= 25 and income < 30000 then risk = 'Medium'; else risk = 'Low'; datalines; 22 25000 28 28000 35 60000 ; run;

πŸ“Œ Example 6: Case-Insensitive Character Comparison

data department;
input empname $ dept $; if upcase(dept) = 'HR' then team = 'Human Resources'; else team = 'Other'; datalines; John HR Alex finance Sara hr ; run;

πŸ“Œ Example 7: Assign Labels to Numeric Ranges

data salary_bracket;
input empid salary; if salary < 30000 then bracket = 'Low'; else if 30000 <= salary < 60000 then bracket = 'Medium'; else bracket = 'High'; datalines; 101 25000 102 32000 103 60000 104 58000 ; run;

πŸ“Œ Example 8: IF Without ELSE (Slower)

Data no_else;
set sashelp.class; if age < 12 then group = 'Preteen'; if age >= 12 then group = 'Teen'; run;

πŸ’‘ Best Practices:

  • Use IF-THEN/ELSE instead of multiple IF statements for better performance.
  • When checking for missing values, use IF var = ..


πŸ”„ 2. DO Loops in SAS

DO loops are used to repeat code a specified number of times or while a condition is true.


βœ… 2.1 DO Loop Syntax

do index = start to end;
/* repeated statements */ end;

πŸ“Œ Example 2: Basic DO Loop

data loop_example;
do i = 1 to 5; square = i**2; output; end; run;

Explanation:
Generates a dataset with numbers 1 to 5 and their squares.


βœ… 2.2 DO WHILE Loop

data do_while;
x = 1; do while (x < 5); square = x**2; output; x + 1; end; run;

βœ… 2.3 DO UNTIL Loop

data do_until;
x = 1; do until (x > 5); cube = x**3; output; x + 1; end; run;

🧠 Combining IF and DO Loops

data even_numbers;
do i = 1 to 10; if mod(i, 2) = 0 then output; end; run;

Explanation:
Generates only even numbers between 1 and 10 using both DO loop and IF condition.


⚠️ Common Mistakes to Avoid

MistakeFix
Missing OUTPUT in DO loopAlways include output; if needed
Forgetting semicolonsEnd each SAS statement with ;
Using multiple IF instead of IF-THEN-ELSECan slow performance

πŸ”š Conclusion

Mastering IF-THEN-ELSE and DO loops in SAS empowers you to create dynamic, flexible, and readable data processing routines. Whether you're classifying data, creating new variables, or iterating through logic, these tools are fundamental to writing efficient SAS programs.

Labels: , , , , , , , ,

Friday, July 4, 2025

Data Cleaning in SAS: Tips and Techniques

Introduction

Data cleaning is a critical step in the data preparation process. Raw data is often messy, incomplete, or inconsistent, and cleaning it ensures accurate and reliable analysis. SAS offers a wide range of tools and functions that make data cleaning efficient and effective.

In this post, you’ll learn essential data cleaning techniques in SAS, including handling missing values, removing duplicates, formatting variables, and more.


1. Identifying and Handling Missing Values

Missing data can lead to biased results. SAS represents missing numeric values as . and character missing values as a blank space ('').

Check for missing values:

PROC MEANS DATA=your_dataset N NMISS;
RUN;

Replace missing values:

DATA cleaned_data;
SET your_dataset; IF Age = . THEN Age = 0; IF Name = '' THEN Name = 'Unknown'; RUN;

2. Removing Duplicate Records

Duplicate data can affect the accuracy of your results.

Identify duplicates:

PROC SORT DATA=your_dataset OUT=check_dups NODUPKEY;
BY ID; RUN;

Remove complete duplicates:

PROC SORT DATA=your_dataset NODUPRECS OUT=cleaned_data;
RUN;

3. Standardizing Text Variables

Standardizing case and removing unwanted spaces improves consistency.

Use SAS functions:

DATA standardized;
SET your_dataset; Name_clean = STRIP(UPCASE(Name)); RUN;

  • STRIP() removes leading and trailing spaces
  • UPCASE(), LOWCASE() or PROPCASE() standardize text case


4. Filtering Out Invalid Values

Sometimes variables contain invalid or out-of-range data.

Example: Remove ages less than 0 or more than 120

DATA cleaned_data;
SET your_dataset; IF 0 <= Age <= 120; RUN;

5. Converting Data Types (INPUT and PUT)

Mismatch between numeric and character types can cause issues.

Convert character to numeric:

Age_num = INPUT(Age_char, 8.);

Convert numeric to character:

Age_char = PUT(Age_num, 8.);

6. Replacing Values with IF or ARRAY Logic

You can recode or transform values using conditional logic.

Example:

DATA recoded;
SET your_dataset; IF Gender = 'M' THEN Gender = 'Male'; ELSE IF Gender = 'F' THEN Gender = 'Female'; RUN;

7. Handling Outliers

Use PROC UNIVARIATE to detect outliers in numeric variables.

PROC UNIVARIATE DATA=your_dataset;
VAR Salary; RUN;

Then, you can treat outliers by capping, removing, or flagging them.


8. Validating Cleaned Data

Use PROC FREQ or PROC MEANS to validate the cleaned dataset.

PROC FREQ DATA=cleaned_data; TABLES Gender Age_Group; RUN; PROC MEANS DATA=cleaned_data; VAR Salary Age; RUN;

Best Practices for Data Cleaning in SAS

  • Always keep a backup of your raw data
  • Document every cleaning step
  • Use descriptive variable names (e.g., Name_clean, Age_flag)
  • Combine steps using macros for reusable workflows


Conclusion

Clean data is the foundation of trustworthy analysis. Using SAS, you can efficiently handle missing values, fix data quality issues, and standardize your datasets. By mastering these data cleaning techniques, you’re well on your way to becoming a proficient SAS programmer and data analyst.


Tags: 

Labels: , , , , , , , , ,