Monday, January 11, 2021

📊How to Read Data in Fixed Columns Using Formatted Input Method | Real time SAS Problems

📌 How to Read Data in Fixed Columns Using SAS

Working with fixed-width (or fixed-column) text files is a common task in SAS programming, especially when importing data from legacy systems or mainframes. Unlike delimited files (like CSV), fixed-column files store data in predefined character positions. In this guide, you'll learn how to read such data files effectively using SAS.

📂 What is a Fixed-Width File?

A fixed-width file is a plain-text file where each data field occupies a specific number of characters. There are no delimiters like commas or tabs; instead, the position and length of each field are predefined.

Example:

101John 24Male
102Alice 28Female

  • ID: 1–3
  • Name: 4–11
  • Age: 12–13
  • Gender: 14–19


📥 Reading Fixed Column Data Using INFILE and INPUT

SAS uses the combination of INFILE and INPUT statements to read fixed-column data.

✅ Syntax:

DATA dataset_name;
INFILE 'file-path.txt'; INPUT variable1 $ start1-end1 variable2 $ start2-end2 ...; RUN;

  • $ indicates a character variable
  • start-end indicates the column positions


🧪 Example: Reading Employee Data

Let’s say you have a file named employees.txt with the following structure:

101John 24Male
102Alice 28Female 103Steve 30Male

To read this in SAS:

DATA employees;
INFILE 'C:\SASData\employees.txt'; INPUT ID 1-3 Name $ 4-11 Age 12-13 Gender $ 14-19; RUN;

🔍 Explanation:

  • ID is numeric (columns 1 to 3)
  • Name is character (columns 4 to 11)
  • Age is numeric (columns 12 to 13)
  • Gender is character (columns 14 to 19)


📎 Tips for Handling Fixed-Width Files

  • Ensure each line in the text file has a uniform width.
  • Use $ in the INPUT statement to define character fields.
  • Always check column positions carefully—especially if fields are padded with spaces.
  • Use LRECL (logical record length) in INFILE if the line length is long:


INFILE 'C:\SASData\employees.txt' LRECL=100;

🔁 Alternative: Using COLUMN POINTERS (Formatted Input Method)

You can also use column pointers (@) to specify exact starting positions for each field.

DATA employees;
INFILE 'C:\SASData\employees.txt'; INPUT @1 ID 3. @4 Name $ 8. @12 Age 2. @14 Gender $ 6.; RUN;

  • @n tells SAS where to begin reading.
  • The numeric formats like 3. or 8. specify field lengths.

More on Reads data from fixed columns

  • Reads character and numeric data including standard and nonstandard numerical values, such as numbers with dollar signs and commas 
  • Reads dates in different formats, not like column input which can only read dates as character values
  • It is the most common and powerful of all the input methods
  • Any time you have nonstandard data in fixed columns, you should consider using formatted input to read the file.

Let's see More Example  -

Suppose we have this raw data stored in any flat file -

1234567890123456789012 ---- Columns indicator, not part of the data

Tim  M14510/21/1978

Sara  13009/20/1964

Mike M18011/23/1965

LauraF13011/06/1980

Sean M16704/07/2000


The variables information is listed as below. 

There are 4 variables in the data - 

Name: occupies total five columns, column 1-5

Gender: occupies total one column, column 6

Weight: occupies three columns, column 7-9

DOB: occupies ten columns, column 10 - 19


data Form_input;
infile "/folders/myfolders/DATA_column.txt";
input 
@1 Name $ 5.
@6 Gender   $  1.
@7 Weight 3.
@10 DOB     mmddyy10.;
Run;

proc print data=sdata_column;
run;

Reading Raw Data
Explanation - 

1. We Used @ to specify the starting position of every variable 

2. Also used SAS Informats

Informats are built-in instructions that tell SAS how to read a data value. The choice of which informat to use is determined by the data.

For standard numeric and character values, two basic informats are w.d and $w.

The w.d Informat: reads standard numeric values. The w tells SAS how many columns to read. The optional d tells SAS that there is a decimal point in the value, the decimal point counts as one of the columns to be read. For example, the variable Weight containing 3 digits values (such as 145 lbs) has an informat of 3. ; Another example, value 2.1 takes up 3 column spaces, 2 digits + 1 decimal point, so it has an informat of 3.1

 

For date: the MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form, 10 means this form takes up 10 columns including 8 digits and 2 slash signs. For example, the variable DOB has values like 10/21/1978. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, 1960. So, if you read the value 01/01/1960 with the MMDDYY10. informat, SAS stores a date value of 0. The date 01/02/1960 read with the same informat would result in a date value of 1, and so forth. 

 

That’s why if you look at the output of above program than DOB displays as some numbers instead of dates, TO Display DOB as Date Value we need to use the Format.

Example - 

proc print data=Form_input;
format  DOB  mmddyy10.;
run;

Reading raw data by datahark


📊 Output Dataset

After reading the data, you can view the dataset using:

PROC PRINT DATA=employees;
RUN;

✅ Summary

TaskSAS Statement
Specify file locationINFILE
Read fixed columnsINPUT with positions
Read character dataAdd $ after variable
Use column positionsUse ranges like 1-3 or pointers like @1
View datasetPROC PRINT

📌 Conclusion

Reading fixed-column data in SAS is straightforward once you understand the layout of the file. By using INFILE, INPUT, and positional referencing, you can efficiently convert raw text files into structured SAS datasets. This skill is especially important for data analysts working with legacy systems or industry-standard text data formats.


Search Tags:

SAS Fixed Column Input
Read Fixed Width Files in SAS
SAS INFILE Statement
SAS INPUT Statement
Column Pointer in SAS
SAS Text File Import
SAS Programming Basics
Importing Data in SAS
Legacy Data Processing SAS
SAS Tutorial for Beginners

Labels: , , , , , , , ,

Tuesday, January 5, 2021

Understanding NOOBS, OBS=, N, and N in SAS Programming

When working with datasets in SAS, it's important to control data processing and output efficiently. SAS provides several options and automatic variables that help manage how data is read, processed, and displayed. In this post, we’ll explore four commonly used features:

  • NOOBS option in PROC PRINT
  • OBS= system option
  • _N_ automatic variable
  • N option in PROC MEANS or PROC FREQ

Let’s understand each with examples.

Use of NOOBS  

It is used for Suppress column which tells the row/observation number in Proc print.

Example - See both outputs with NOOBS and without NOOBS

Proc Print Data=SASHELP.CARS;
WHERE MAKE='Audi';
RUN;

Proc Print Data=SASHELP.CARS NOOBS;
WHERE MAKE='Audi';
RUN;

Use of Noobs

OBS= System Option

The OBS= option limits the number of observations that SAS reads from a dataset.

Syntax:
options obs=5;

Example:
options obs=5;
proc print data=sashelp.class;
run;

 Use of OBS  and N options - 

OBS option give the Actual row number from the original dataset, for example in below code output, row number starts from 8 because First row of Audi car in source dataset is available at 8th position


Proc PRINT data=SASHELP.cars N OBS;
where Make='Audi';
run;

Use of OBS and N - Datahark

N Option gives the Total count of observation in dataset.

Use of _N_ 

As I already explained in  my previous post for PDV - there are two automatic variables are being created in datastep these are - _N_ and _ERROR_

_N_ counts the Number of times Datastep begin to execute.

For Example - if we want to print the top 10 rows of any dataset - 

data test;
set SASHELP.CARS;
N=_N_;
if _n_<=10;
run;

Use of _N_ = Datahark

Summary Table

FeatureUsagePurpose
NOOBSIn PROC PRINTHide observation numbers in output
OBS=Global system optionLimit records read from datasets
_N_In DATA stepsCount iteration, control row logic
NIn summary proceduresDisplay count of non-missing values

Final Thoughts

These options and variables are small but powerful tools in a SAS programmer’s toolkit. Whether you're generating cleaner reports with NOOBS, testing code with OBS=, controlling logic with _N_, or summarizing data using N, understanding their usage can significantly enhance your efficiency in SAS.

Labels: , , , , , , , ,

APPEND DATASETS USING SET | MULTIPLE SET | PROC APPEND | PROC SQL UNION

Here we are going to Learn how to Combine/Stack/Append Datasets in SAS, See below Picture to understand- What is Append


Append Datasets In SAS Datahark

SAS has multiple Methods to append datasets, these includes - 

  • Using Set in Datastep
  • Proc Append
  • Proc SQL
  • Merge

Example 1 - Using Set in Datastep 

Data Male;
Input Name $ Age;
Datalines;
Mark 45
Peter 60
Jack 20
;
run;

Data Female;
Input Name $ Age;
Datalines;
Lisa 45
Jane 60
Julie 20
Ruma 42
;
run;

Data Combined;
set Male Female;
run;

Append with Set

Lets See what happen if we write multiple SET statements - 

Data Combined;
set Male;
set Female;
run;

Multiple Set Statements

Here First it copied Male Dataset into Combined (All Three rows) and then Override all three rows of combined dataset with Female Dataset's First Three rows

Example 2 - Using PROC APPEND

Data Male;
Input Name $ Age;
Datalines;
Mark 45
Peter 60
Jack 20
;
run;

Data Female;
Input Name $ Age;
Datalines;
Lisa 45
Jane 60
Julie 20
Ruma 42
;
run;

Proc Append Base= Male Data=Female;
run;

Proc Print Data=Male;
run;

Proc Print Data=Female;

Data Male;
Input Name $ Age;
Datalines;
Mark 45
Peter 60
Jack 20
;
run; 

Data Female;
Input Name $ Age;
Datalines;
Lisa 45
Jane 60
Julie 20
Ruma 42
;
run;

Proc Append Base= Male Data=Female;
run;
Proc Print Data=Male;
run;
Proc Print Data=Female;
run; 

PROC Append

The above proc appended rows of Female Dataset into Male Dataset. It is faster than SET statement as here the second dataset is just appended to the first one and the SAS do not have to process the first dataset in the PDV, Essentially what happens with set or merge is that you bring both the datasets into PDV and then setting is done, the append does not do that it just add the second dataset to base one with out bringing the base dataset to PDV.

Few points for Proc Append - 

  • If PROC APPEND cannot find the BASE= data set, SAS creates it
  • If a DATA= data set is not specified, SAS uses the most recently created SAS data set
  • If appending data sets with different variables or attributes, use the FORCE option. This option must be used to prevent a syntax error.

Example 3 - Using PROC SQL - UNION

Data Male;
Input Name $ Age;
Datalines;
Mark 45
Peter 60
Jack 20
;
run;

Data Female;
Input Name $ Age;
Datalines;
Lisa 45
Jane 60
Julie 20
Ruma 42
;
run;

PROC SQL;
CREATE TABLE COMB_MF AS 
SELECT * FROM MALE
UNION CORR
SELECT * FROM FEMALE;
QUIT;

It will give same output as Example 1 

Here UNION Keyword is used to combined dataset and CORR keyword used to tell SAS that Append datasets based on the Column Name, If we dont use this Keyword PROC SQL will append based on the column sequence.

Labels: , , , , , , ,