Friday, July 11, 2025

How to Create Data Groups Using PROC FORMAT in SAS: Real-World Examples

Problem/Requirement -

Lets Suppose, We have Data in Numbers which could be range from 100 to 10000
now we want to create Groups of Each 100 Values, like below - 

Groups Example in SAS



Lets See with the Example | The Power of the FORMAT Procedure - 


/*This code will Generate some Numbers between 100 and 10000 to Get the Practice Dataset */

Data A;
Num=100;
Do while(Num<10000);
output;
Num=Num+54;
end;
Run;
Proc Print Data=A;
Run;

Creating Groups in SAS


Now we want to Group it in as per given in Requirement 

/* This Code will Genrate FMTName (Format Name), Start(Starting Number of Group), End(Ending Number of Group), Label(Group name for Numbers between Start and End Values)
*/
Data B;
Retain FMTName "Group";
Start=100;
i=1;
Do until(End>=10000);
End=Start+99;
Label=compbl("Group"||i);
Output;
Start=End+1;
i=i+1;
end;
drop i;
run;

Proc Print Data=B;
Run;

Grouping Data in SAS

Now if we need to create a User Defined Format -

/* This wil create a format for Above created B Dataset with Format Name "Group", PROC FORMAT with the CNTLIN= option is to build the formats from the imported control data set */

Proc Format CNTLIN=B;
Run; 

/*Lets see how it will print A Dataset if we apply Newly create Group format to Variable Num */

Proc Print Data=A;
Format Num Group.;
run; 

USE of Proc Format in SAS
/* If we want Actual Number and the Group name both the things then we can do it like this */

Data C;
Set A;
GroupName=Put(Num,Group.);
run;
/*Here we created a new Variable Group name and Copy value of Num with Group format, For this we used PUT Function */
Proc Print Data=C;
Run;
USE of Proc Format in SAS


I Hope you like this Article, please comment if you have any doubt or question.
Thank you

Labels: , , , , , , , , , , ,

Friday, July 4, 2025

Data Cleaning in SAS: Tips and Techniques

Introduction

Data cleaning is a critical step in the data preparation process. Raw data is often messy, incomplete, or inconsistent, and cleaning it ensures accurate and reliable analysis. SAS offers a wide range of tools and functions that make data cleaning efficient and effective.

In this post, you’ll learn essential data cleaning techniques in SAS, including handling missing values, removing duplicates, formatting variables, and more.


1. Identifying and Handling Missing Values

Missing data can lead to biased results. SAS represents missing numeric values as . and character missing values as a blank space ('').

Check for missing values:

PROC MEANS DATA=your_dataset N NMISS;
RUN;

Replace missing values:

DATA cleaned_data;
SET your_dataset; IF Age = . THEN Age = 0; IF Name = '' THEN Name = 'Unknown'; RUN;

2. Removing Duplicate Records

Duplicate data can affect the accuracy of your results.

Identify duplicates:

PROC SORT DATA=your_dataset OUT=check_dups NODUPKEY;
BY ID; RUN;

Remove complete duplicates:

PROC SORT DATA=your_dataset NODUPRECS OUT=cleaned_data;
RUN;

3. Standardizing Text Variables

Standardizing case and removing unwanted spaces improves consistency.

Use SAS functions:

DATA standardized;
SET your_dataset; Name_clean = STRIP(UPCASE(Name)); RUN;

  • STRIP() removes leading and trailing spaces
  • UPCASE(), LOWCASE() or PROPCASE() standardize text case


4. Filtering Out Invalid Values

Sometimes variables contain invalid or out-of-range data.

Example: Remove ages less than 0 or more than 120

DATA cleaned_data;
SET your_dataset; IF 0 <= Age <= 120; RUN;

5. Converting Data Types (INPUT and PUT)

Mismatch between numeric and character types can cause issues.

Convert character to numeric:

Age_num = INPUT(Age_char, 8.);

Convert numeric to character:

Age_char = PUT(Age_num, 8.);

6. Replacing Values with IF or ARRAY Logic

You can recode or transform values using conditional logic.

Example:

DATA recoded;
SET your_dataset; IF Gender = 'M' THEN Gender = 'Male'; ELSE IF Gender = 'F' THEN Gender = 'Female'; RUN;

7. Handling Outliers

Use PROC UNIVARIATE to detect outliers in numeric variables.

PROC UNIVARIATE DATA=your_dataset;
VAR Salary; RUN;

Then, you can treat outliers by capping, removing, or flagging them.


8. Validating Cleaned Data

Use PROC FREQ or PROC MEANS to validate the cleaned dataset.

PROC FREQ DATA=cleaned_data; TABLES Gender Age_Group; RUN; PROC MEANS DATA=cleaned_data; VAR Salary Age; RUN;

Best Practices for Data Cleaning in SAS

  • Always keep a backup of your raw data
  • Document every cleaning step
  • Use descriptive variable names (e.g., Name_clean, Age_flag)
  • Combine steps using macros for reusable workflows


Conclusion

Clean data is the foundation of trustworthy analysis. Using SAS, you can efficiently handle missing values, fix data quality issues, and standardize your datasets. By mastering these data cleaning techniques, you’re well on your way to becoming a proficient SAS programmer and data analyst.


Tags: 

Labels: , , , , , , , , ,

Monday, January 11, 2021

📊How to Read Data in Fixed Columns Using Formatted Input Method | Real time SAS Problems

📌 How to Read Data in Fixed Columns Using SAS

Working with fixed-width (or fixed-column) text files is a common task in SAS programming, especially when importing data from legacy systems or mainframes. Unlike delimited files (like CSV), fixed-column files store data in predefined character positions. In this guide, you'll learn how to read such data files effectively using SAS.

📂 What is a Fixed-Width File?

A fixed-width file is a plain-text file where each data field occupies a specific number of characters. There are no delimiters like commas or tabs; instead, the position and length of each field are predefined.

Example:

101John 24Male
102Alice 28Female

  • ID: 1–3
  • Name: 4–11
  • Age: 12–13
  • Gender: 14–19


📥 Reading Fixed Column Data Using INFILE and INPUT

SAS uses the combination of INFILE and INPUT statements to read fixed-column data.

✅ Syntax:

DATA dataset_name;
INFILE 'file-path.txt'; INPUT variable1 $ start1-end1 variable2 $ start2-end2 ...; RUN;

  • $ indicates a character variable
  • start-end indicates the column positions


🧪 Example: Reading Employee Data

Let’s say you have a file named employees.txt with the following structure:

101John 24Male
102Alice 28Female 103Steve 30Male

To read this in SAS:

DATA employees;
INFILE 'C:\SASData\employees.txt'; INPUT ID 1-3 Name $ 4-11 Age 12-13 Gender $ 14-19; RUN;

🔍 Explanation:

  • ID is numeric (columns 1 to 3)
  • Name is character (columns 4 to 11)
  • Age is numeric (columns 12 to 13)
  • Gender is character (columns 14 to 19)


📎 Tips for Handling Fixed-Width Files

  • Ensure each line in the text file has a uniform width.
  • Use $ in the INPUT statement to define character fields.
  • Always check column positions carefully—especially if fields are padded with spaces.
  • Use LRECL (logical record length) in INFILE if the line length is long:


INFILE 'C:\SASData\employees.txt' LRECL=100;

🔁 Alternative: Using COLUMN POINTERS (Formatted Input Method)

You can also use column pointers (@) to specify exact starting positions for each field.

DATA employees;
INFILE 'C:\SASData\employees.txt'; INPUT @1 ID 3. @4 Name $ 8. @12 Age 2. @14 Gender $ 6.; RUN;

  • @n tells SAS where to begin reading.
  • The numeric formats like 3. or 8. specify field lengths.

More on Reads data from fixed columns

  • Reads character and numeric data including standard and nonstandard numerical values, such as numbers with dollar signs and commas 
  • Reads dates in different formats, not like column input which can only read dates as character values
  • It is the most common and powerful of all the input methods
  • Any time you have nonstandard data in fixed columns, you should consider using formatted input to read the file.

Let's see More Example  -

Suppose we have this raw data stored in any flat file -

1234567890123456789012 ---- Columns indicator, not part of the data

Tim  M14510/21/1978

Sara  13009/20/1964

Mike M18011/23/1965

LauraF13011/06/1980

Sean M16704/07/2000


The variables information is listed as below. 

There are 4 variables in the data - 

Name: occupies total five columns, column 1-5

Gender: occupies total one column, column 6

Weight: occupies three columns, column 7-9

DOB: occupies ten columns, column 10 - 19


data Form_input;
infile "/folders/myfolders/DATA_column.txt";
input 
@1 Name $ 5.
@6 Gender   $  1.
@7 Weight 3.
@10 DOB     mmddyy10.;
Run;

proc print data=sdata_column;
run;

Reading Raw Data
Explanation - 

1. We Used @ to specify the starting position of every variable 

2. Also used SAS Informats

Informats are built-in instructions that tell SAS how to read a data value. The choice of which informat to use is determined by the data.

For standard numeric and character values, two basic informats are w.d and $w.

The w.d Informat: reads standard numeric values. The w tells SAS how many columns to read. The optional d tells SAS that there is a decimal point in the value, the decimal point counts as one of the columns to be read. For example, the variable Weight containing 3 digits values (such as 145 lbs) has an informat of 3. ; Another example, value 2.1 takes up 3 column spaces, 2 digits + 1 decimal point, so it has an informat of 3.1

 

For date: the MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form, 10 means this form takes up 10 columns including 8 digits and 2 slash signs. For example, the variable DOB has values like 10/21/1978. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, 1960. So, if you read the value 01/01/1960 with the MMDDYY10. informat, SAS stores a date value of 0. The date 01/02/1960 read with the same informat would result in a date value of 1, and so forth. 

 

That’s why if you look at the output of above program than DOB displays as some numbers instead of dates, TO Display DOB as Date Value we need to use the Format.

Example - 

proc print data=Form_input;
format  DOB  mmddyy10.;
run;

Reading raw data by datahark


📊 Output Dataset

After reading the data, you can view the dataset using:

PROC PRINT DATA=employees;
RUN;

✅ Summary

TaskSAS Statement
Specify file locationINFILE
Read fixed columnsINPUT with positions
Read character dataAdd $ after variable
Use column positionsUse ranges like 1-3 or pointers like @1
View datasetPROC PRINT

📌 Conclusion

Reading fixed-column data in SAS is straightforward once you understand the layout of the file. By using INFILE, INPUT, and positional referencing, you can efficiently convert raw text files into structured SAS datasets. This skill is especially important for data analysts working with legacy systems or industry-standard text data formats.


Search Tags:

SAS Fixed Column Input
Read Fixed Width Files in SAS
SAS INFILE Statement
SAS INPUT Statement
Column Pointer in SAS
SAS Text File Import
SAS Programming Basics
Importing Data in SAS
Legacy Data Processing SAS
SAS Tutorial for Beginners

Labels: , , , , , , , ,