Saturday, July 12, 2025

✅ All About PROC SUMMARY in SAS: A Comprehensive Guide with Practical Examples

📘 Introduction

PROC SUMMARY in SAS is a powerful procedure used to generate summary statistics for numeric variables. It is often considered functionally equivalent to PROC MEANS but with more flexibility in generating customized outputs and silent summaries.

In this guide, you'll learn:

  • Complete syntax of PROC SUMMARY
  • All available options and statements
  • Grouped examples
  • How it differs from PROC MEANS
  • Output datasets and tips

Proc Summary in SAS - Datahark.in


🛠 Syntax of PROC SUMMARY

PROC SUMMARY <options>;
VAR variable(s); CLASS variable(s); BY variable(s); OUTPUT OUT=dataset <stat-options>; RUN;

⚙️ Common Options in PROC SUMMARY

OptionDescription
DATA=Specifies the input dataset
NCount of non-missing values
MEANMean or average value
STDStandard deviation
MINMinimum value
MAXMaximum value
SUMTotal sum
MAXDEC=Maximum number of decimals
NWAYOutputs only rows with all CLASS variables present
CHARTYPEAdds a TYPE variable to output

📄 Statements in PROC SUMMARY

StatementPurpose
VARSpecifies numeric variables to analyze
CLASSGroup summary statistics by categorical variables
BYBY-group processing; data must be sorted
OUTPUTOutputs summary statistics to a dataset

✅ PROC SUMMARY vs PROC MEANS

FeaturePROC MEANSPROC SUMMARY
Displays outputYes (default)No (unless PRINT)
FlexibilityModerateHigh
Use in productionFor display/reviewFor data pipelines
Output datasetOptionalCommon

🧪 Examples of PROC SUMMARY

Example 1: Basic Summary Statistics

proc summary data=sashelp.class print;
run;

Note: Use PRINT to display output.


Example 2: Specifying Variables

proc summary data=sashelp.class print;
var height weight; run;

Example 3: Using CLASS Statement

proc summary data=sashelp.class print;
class sex; var height weight; run;

Example 4: Creating Output Dataset

proc summary data=sashelp.class n mean maxdec=1;
class sex; var height weight; output out=summary_stats; run;

Example 5: Custom Output Variable Names

proc summary data=sashelp.class;
class sex; var weight; output out=summary_data mean=mean_weight max=max_weight min=min_weight; run;

Example 6: Using BY Statement (sorted data)

proc sort data=sashelp.class out=sorted;
by sex; run; proc summary data=sorted print; by sex; var height; run;

🔄 Output Options in PROC SUMMARY

You can customize the summary stats by combining the following keywords in the OUTPUT statement:

KeywordDescription
N=Assigns name to N (count)
MEAN=Assigns name to mean
STD=Standard deviation
SUM=Assigns name to sum
MIN=Assigns name to minimum
MAX=Assigns name to maximum

Example:

output out=myout n=n_obs mean=avg std=stdev;

🧠 Tips for PROC SUMMARY

  • Always use PRINT if you want to display results.
  • Use NWAY if you need only fully classified combinations.
  • Use meaningful output variable names with =.
  • Great for creating reusable summary datasets.


📦 Summary Table

FeatureDescription
Primary UseSummary statistics
Shows OutputNo (by default)
Supports GroupsYes (CLASS or BY)
Custom OutputYes (OUTPUT statement)
Output DatasetYes
Flexible OutputHighly customizable

 

Click here to Read more »

Labels: , , , , , , , , , , , , ,

📆 SAS Date Interval Functions Explained: INTNX vs INTCK with Examples

Date manipulation is a core part of data analysis in SAS. When working with time series, financial models, or scheduling, calculating intervals and moving dates is essential. That’s where SAS functions like INTNX and INTCK come in.

SAS Date Functions - INTNX and INTCK

In this blog post, we will explore:

  • What are INTNX and INTCK functions?
  • Complete syntax and arguments
  • Important options
  • Practical real-life examples
  • Comparison and use-cases

🔍 What is INTNX in SAS?

The INTNX (Interval Next) function adds a specified number of intervals to a date or datetime and returns a new date.

📌 Syntax:

INTNX(interval, start-from, increment <, 'alignment' <, method>>)

✅ Arguments:

ParameterDescription
intervalThe type of interval to add (e.g., day, week, month, year).
start-fromThe base SAS date or datetime value.
incrementNumber of intervals to move forward or backward.
'alignment' (optional)'BEGINNING', 'MIDDLE', 'END', 'SAME' (default is 'BEGINNING').
method (optional)'S' (simple), 'C' (concurrent), or 'E' (end of period alignment).

🧪 INTNX Examples

Example 1: Add 3 months to a date

data result;
new_date = intnx('month', '01JAN2024'd, 3); format new_date date9.; run;

Output: 01APR2024


Example 2: Move 2 years back and align to end of year

data result;
year_end = intnx('year', '15JUL2024'd, -2, 'END'); format year_end date9.; run;

Output: 31DEC2022


Example 3: Use 'SAME' alignment

data result;
same_date = intnx('month', '10JAN2024'd, 1, 'SAME'); format same_date date9.; run;

Output: 10FEB2024


🧠 Pro Tip:

You can use INTNX with time intervals too, like 'hour', 'minute', 'qtr', 'dtday', etc.


⏳ What is INTCK in SAS?

The INTCK (Interval Check) function calculates the number of intervals between two dates or datetimes.

📌 Syntax:

INTCK(interval, start, end <, 'method'>)

✅ Arguments:

ParameterDescription
intervalType of interval (e.g., day, week, month, year).
startStarting SAS date or datetime.
endEnding SAS date or datetime.
'method' (optional)'C' (continuous), 'D' (discrete), or 'S' (simple)

🧪 INTCK Examples

Example 1: Count number of months between two dates

data result;
month_diff = intck('month', '01JAN2023'd, '01JUL2023'd); run;

Output: 6


Example 2: Days between two dates

data result;
day_diff = intck('day', '15MAR2023'd, '20APR2023'd); run;

Output: 36


Example 3: Count number of years using 'CONTINUOUS' method

data result;
year_cont = intck('year', '31DEC2019'd, '01JAN2021'd, 'C'); run;

Output: 2


🔄 INTNX vs INTCK in SAS

FeatureINTNXINTCK
PurposeMove a date forward/backwardCalculate number of intervals
ReturnsDate or datetime valueInteger (count of intervals)
Optional ArgAlignment ('BEGINNING', etc.)Method ('C', 'D', 'S')
Use CaseScheduling future/past eventsAnalyzing gaps/durations

📘 Common Intervals in SAS

Interval TypeExampleDescription
day'day'Daily interval
week'week'Weekly interval
month'month'Monthly interval
qtr'qtr'Quarterly interval
year'year'Yearly interval
dtday'dtday'Datetime day interval
hour'hour'Hour interval

🛠 Real-World Use Cases

🧾 1. Loan Payment Schedules

due_date = intnx('month', start_date, 12, 'END');

📊 2. Monthly Sales Comparison

month_diff = intck('month', previous_sale, current_sale);

🧑‍💼 3. Employee Tenure Calculation

years_worked = intck('year', hire_date, today());

🎯 Tips for Using INTNX and INTCK

  • Always format the date with FORMAT datevar DATE9. or DATETIME20..
  • Use 'SAME' alignment when you want the exact day repeated.
  • Prefer 'C' method in INTCK for financial year or continuous interval calculations.


🧾 Summary

FunctionPurposeReturnsKey Argument
INTNXAdd intervals to a dateDate'alignment'
INTCKCount intervals between two datesInteger'method'

Both INTNX and INTCK are indispensable tools in time-based analysis in SAS. Whether you’re calculating tenure, creating forecasts, or aligning schedules, mastering these functions will significantly enhance your date handling capabilities.

Click here to Read more »

Labels: , , , , , , , , , , , , ,

Saturday, December 26, 2020

Mastering the INPUT and INFILE Statements in SAS: A Beginner's Guide with Examples

Introduction

If you're just starting with SAS programming, one of the most essential skills you'll need is the ability to read raw data files into SAS datasets. This is where the INPUT and INFILE statements come into play. These two statements are fundamental when you're working with external data sources like .txt, .csv, or .dat files.

In this post, we'll explore:

  • What the INFILE and INPUT statements do
  • Syntax and options
  • Step-by-step examples
  • Common errors and best practices

Whether you're preparing for the Base SAS Certification or learning SAS for data analytics, this guide will help you understand how to efficiently read data into SAS.


What is the INFILE Statement in SAS?

The INFILE statement is used to tell SAS where to find your external data file. It defines the file path and other important file-reading options such as delimiters and record length.

Syntax:

INFILE 'file-path' <options>;

Example:

INFILE 'C:\Data\students.txt' DLM=',' FIRSTOBS=2;

  • DLM=',' — Specifies the delimiter as a comma.
  • FIRSTOBS=2 — Tells SAS to start reading from the second row (often used to skip headers).


What is the INPUT Statement in SAS?

The INPUT statement tells SAS how to read the data — the structure, variable names, types, and formats. It works hand-in-hand with the INFILE statement to read external files into a SAS dataset.

Syntax:

INPUT var1 $ var2 var3;

  • Use $ for character variables.
  • No $ is needed for numeric variables.

Example:

INPUT Name $ Age Height;

Example: Reading a CSV File Using INFILE and INPUT

Let's say you have a file named students.csv with the following data:

Name,Age,Score
Alice,22,88 Bob,23,91 Charlie,21,85

SAS Code:

DATA student_data;
INFILE 'C:\Users\YourName\Documents\students.csv' DLM=',' FIRSTOBS=2; INPUT Name $ Age Score; RUN; PROC PRINT DATA=student_data; RUN;

More on Input and Infile Statement - 

1. 
Data <Dataset Name>
Infile 'Location\Filename' DLM=<delimiter> Firstobs=<Starting Position>;
Input <Variable Name and formats>;
run;

2. 
FileName <Name/Alias for File> 'Location/FileName';
Data <Dataset Name>
Infile above defined Name/Alias for File  DLM=<delimiter> Firstobs=<Starting Position>;;
Input <Variable Name and formats>;
run;

Here FileName is a Global Statement

For Example - I have a Text file on my location - /home/../New Folder/New Text Document.txt with Credit Card Number and Spend information, to import that
1.
Data datahark;
infile  '/home/../New Folder/New Text Document.txt' Firstobs=2;
INput cc Spend;
run;

2.
Filename CC '/home/../New Folder/New Text Document.txt';
Data datahark;
infile  cc Firstobs=2;
INput cc Spend;
run;

Both of above code will import the File and Give the same output - 

Use of Infile and Input Statement in SAS

Important Options in INFILE

OptionDescription
DLM=','Sets the delimiter (comma, space, tab, etc.)
FIRSTOBS=nReads from the nth record (skip headers)
DSDHandles missing values and quotes in CSV
MISSOVERPrevents reading the next line if data is missing
TRUNCOVERAvoids errors when fewer columns exist than expected

Reading Space-Delimited Data

DATA test_scores;
INFILE 'C:\Data\testdata.txt'; INPUT StudentID $ Subject $ Score; RUN;

This reads a space-separated file without needing DLM.


Common Mistakes to Avoid

  1. File Not Found: Always use the full path or set the correct working directory.
  2. Wrong Delimiter: Use the correct delimiter option in INFILE.
  3. Incorrect Data Types: Forgetting to use $ for character variables.
  4. Header Row Issues: Use FIRSTOBS=2 if your data has a header.


Best Practices

  • Use DSD and FIRSTOBS=2 for clean CSV handling.
  • Always validate data using PROC PRINT after loading.
  • Modularize your code: Keep data import in a separate step.
Click here to Read more »

Labels: , , , , , , , , , ,