Monday, August 4, 2025

πŸ“Š PROC RANK in SAS – Rank, Percentile, and Group Your Data Easily

Introduction

In data analysis, ranking values is essential for identifying top performers, segmenting data, and calculating percentiles. PROC RANK in SAS makes this process easy by assigning ranks, percentiles, or group numbers to numeric variables.

Proc Rank by Datahark


πŸ”§ Syntax of PROC RANK

PROC RANK DATA=input_dataset OUT=output_dataset
RANKS=rank_variable <TIES=LOW|HIGH|MEAN|DENSE>; VAR variable_to_rank; BY group_variable; <GROUPS=n>; RUN;

Key Options Explained:

OptionDescription
DATA=Input dataset
OUT=Output dataset with new rank variable
RANKS=Name of the new variable that stores the rank
TIES=Specifies how tied values are handled (default is MEAN)
BYPerform ranking within each BY-group
VARVariable to rank
GROUPS=Divide data into equal-sized groups (like quantiles or deciles)

πŸ“Œ Example 1: Basic Ranking

proc rank data=sashelp.class out=ranked_class;
var height; ranks height_rank; run;

Explanation:
Ranks students in sashelp.class by their height, storing the result in height_rank.


πŸ“Œ Example 2: Ranking within Groups

proc sort data=sashelp.class out=sorted_class;
by sex; run; proc rank data=sorted_class out=ranked_sex; by sex; var weight; ranks weight_rank; run;

Explanation:
Ranks weight within each sex group.


πŸ“Œ Example 3: Create Percentile or Quantile Groups

proc rank data=sashelp.class out=grouped_class groups=4;
var age; ranks age_quartile; run;

Explanation:
Divides age into 4 quartile groups (0 to 3).


πŸ“Œ TIES= Option in Action

proc rank data=sashelp.class out=ranked_ties ties=low;
var height; ranks height_rank; run;

TIES= Options:

  • LOW – Lowest rank for all ties
  • HIGH – Highest rank for all ties
  • MEAN – Average rank (default)
  • DENSE – No gaps between ranks


βœ… When to Use PROC RANK

  • Ranking top N values
  • Creating quantile-based bins (e.g., deciles, quartiles)
  • Calculating percentiles
  • Segmenting customers or products
  • Normalizing scorecards


🧠 Tips for Using PROC RANK

  • Always sort the dataset before using BY.
  • Use GROUPS= for percentiles or bucketing.
  • For multiple variables, use multiple VAR and RANKS pairs.
  • Combine with PROC SQL or PROC PRINT for better reporting.


πŸ“Ž Final Thoughts

PROC RANK is a powerful yet simple procedure in SAS that enables effective data ranking and segmentation. It’s especially useful in scoring, customer segmentation, and exploratory data analysis.

Labels: , , , , , , , , , , , , ,

Saturday, July 12, 2025

βœ… All About PROC SUMMARY in SAS: A Comprehensive Guide with Practical Examples

πŸ“˜ Introduction

PROC SUMMARY in SAS is a powerful procedure used to generate summary statistics for numeric variables. It is often considered functionally equivalent to PROC MEANS but with more flexibility in generating customized outputs and silent summaries.

In this guide, you'll learn:

  • Complete syntax of PROC SUMMARY
  • All available options and statements
  • Grouped examples
  • How it differs from PROC MEANS
  • Output datasets and tips

Proc Summary in SAS - Datahark.in


πŸ›  Syntax of PROC SUMMARY

PROC SUMMARY <options>;
VAR variable(s); CLASS variable(s); BY variable(s); OUTPUT OUT=dataset <stat-options>; RUN;

βš™οΈ Common Options in PROC SUMMARY

OptionDescription
DATA=Specifies the input dataset
NCount of non-missing values
MEANMean or average value
STDStandard deviation
MINMinimum value
MAXMaximum value
SUMTotal sum
MAXDEC=Maximum number of decimals
NWAYOutputs only rows with all CLASS variables present
CHARTYPEAdds a TYPE variable to output

πŸ“„ Statements in PROC SUMMARY

StatementPurpose
VARSpecifies numeric variables to analyze
CLASSGroup summary statistics by categorical variables
BYBY-group processing; data must be sorted
OUTPUTOutputs summary statistics to a dataset

βœ… PROC SUMMARY vs PROC MEANS

FeaturePROC MEANSPROC SUMMARY
Displays outputYes (default)No (unless PRINT)
FlexibilityModerateHigh
Use in productionFor display/reviewFor data pipelines
Output datasetOptionalCommon

πŸ§ͺ Examples of PROC SUMMARY

Example 1: Basic Summary Statistics

proc summary data=sashelp.class print;
run;

Note: Use PRINT to display output.


Example 2: Specifying Variables

proc summary data=sashelp.class print;
var height weight; run;

Example 3: Using CLASS Statement

proc summary data=sashelp.class print;
class sex; var height weight; run;

Example 4: Creating Output Dataset

proc summary data=sashelp.class n mean maxdec=1;
class sex; var height weight; output out=summary_stats; run;

Example 5: Custom Output Variable Names

proc summary data=sashelp.class;
class sex; var weight; output out=summary_data mean=mean_weight max=max_weight min=min_weight; run;

Example 6: Using BY Statement (sorted data)

proc sort data=sashelp.class out=sorted;
by sex; run; proc summary data=sorted print; by sex; var height; run;

πŸ”„ Output Options in PROC SUMMARY

You can customize the summary stats by combining the following keywords in the OUTPUT statement:

KeywordDescription
N=Assigns name to N (count)
MEAN=Assigns name to mean
STD=Standard deviation
SUM=Assigns name to sum
MIN=Assigns name to minimum
MAX=Assigns name to maximum

Example:

output out=myout n=n_obs mean=avg std=stdev;

🧠 Tips for PROC SUMMARY

  • Always use PRINT if you want to display results.
  • Use NWAY if you need only fully classified combinations.
  • Use meaningful output variable names with =.
  • Great for creating reusable summary datasets.


πŸ“¦ Summary Table

FeatureDescription
Primary UseSummary statistics
Shows OutputNo (by default)
Supports GroupsYes (CLASS or BY)
Custom OutputYes (OUTPUT statement)
Output DatasetYes
Flexible OutputHighly customizable

Β 

Click here to Read more Β»

Labels: , , , , , , , , , , , , ,

πŸ“Š PROC MEANS in SAS – A Complete Guide with Syntax, Options & Examples

πŸ“˜ Introduction

The PROC MEANS procedure in SAS is one of the most frequently used procedures for generating descriptive statistics. It helps compute means, medians, standard deviations, minimums, maximums, and more for numeric variables.

This blog post explores everything about PROC MEANS:

  • Syntax and arguments
  • Available statistics
  • Options and statements
  • Multiple grouped examples
  • Tips for better analysis


Proc means in SAS - Datahark.in

πŸ”§ Syntax of PROC MEANS

PROC MEANS <options>;
VAR variable(s); CLASS variable(s); BY variable(s); OUTPUT OUT=dataset <output-options>; RUN;

🧾 Commonly Used Options in PROC MEANS

OptionDescription
NCount of non-missing values
MEANAverage value
STDStandard deviation
MINMinimum value
MAXMaximum value
MEDIANMedian value
SUMSum of values
MAXDEC=Maximum number of decimals
DATA=Specifies input dataset
NWAYForces output only for combinations of all class variables
CHARTYPEAdds type variable in output
Q1, Q31st and 3rd quartiles

🧠 Key Statements in PROC MEANS

StatementPurpose
VARSpecifies numeric variables to analyze
CLASSPerforms group-wise analysis (similar to GROUP BY)
BYPerforms BY-group processing (requires sorted data)
OUTPUTSaves results to a new dataset

πŸ§ͺ PROC MEANS Examples

βœ… Example 1: Basic Summary Statistics

proc means data=sashelp.class;
run;

Output: N, Mean, Std, Min, Max for all numeric variables.


βœ… Example 2: Specify Variables and Options

proc means data=sashelp.class mean std maxdec=2;
var age height weight; run;

Output: Mean and standard deviation for specified variables with 2 decimals.


βœ… Example 3: Using CLASS Statement

proc means data=sashelp.class n mean median maxdec=1;
class sex; var height weight; run;

Output: Summary by gender.


βœ… Example 4: Using BY Statement

proc sort data=sashelp.class out=sorted;
by sex; run; proc means data=sorted n mean std; by sex; run;

Note: BY requires pre-sorting.


βœ… Example 5: Saving Output to a Dataset

proc means data=sashelp.class n mean max min;
var height weight; class sex; output out=class_summary mean=mean_height mean_weight; run;

Output Dataset: class_summary with mean of height and weight by sex.


βœ… Example 6: Percentiles and Custom Statistics

proc means data=sashelp.class n mean median q1 q3;
var weight; run;

πŸ“Œ When to Use CLASS vs BY in PROC MEANS

FeatureCLASSBY
SortingNot requiredRequires sorting
OutputSummary by groupSeparate table per group
FlexibilityMore user-friendly for reportingIdeal for structured data

🧠 Tips for Using PROC MEANS Effectively

  • Use MAXDEC= to format output.
  • CLASS is easier to use than BY for grouped summaries.
  • Combine with OUTPUT statement to reuse summary data.
  • Filter data using WHERE before calling PROC MEANS.

🧾 Summary Table

FeatureDescription
Procedure NamePROC MEANS
Primary UseDescriptive statistics
Key OutputsN, Mean, Std, Min, Max, Median, etc.
Common OptionsMAXDEC=, NWAY, CHARTYPE
Supports GroupingYes – via CLASS and BY

Click here to Read more Β»

Labels: , , , , , , , , , , , , ,

πŸ”„ Mastering PROC TRANSPOSE in SAS: Convert Rows to Columns and Vice Versa

PROC TRANSPOSE is a powerful SAS procedure used to reshape data by converting rows into columns or columns into rows. Whether you're preparing datasets for reporting or statistical analysis, PROC TRANSPOSE can simplify your task with just a few lines of code.

Proc Transpose in SAS - Datahark

In this blog, you'll learn:

  • What is PROC TRANSPOSE?
  • When and why to use it
  • Syntax and options
  • Multiple real-world examples
  • Tips and tricks for efficient use


πŸ” What is PROC TRANSPOSE in SAS?

PROC TRANSPOSE is used to pivot dataβ€”turning variables (columns) into observations (rows), or vice versa. It's especially useful for:

  • Summarizing repeated measures
  • Restructuring long or wide datasets
  • Preparing data for visualizations or modeling


πŸ“š Basic Syntax of PROC TRANSPOSE

proc transpose data=input_data out=output_data <options>;
by variable(s); * Optional: groups data; id variable; * Optional: names for new columns; var variable(s); * Variables to transpose; run;

πŸ“Œ Key Options Explained

OptionDescription
BYGroups data before transposing
VARSpecifies variables to transpose
IDUses values of a variable as new column names
NAME=Renames the default _NAME_ column
LABEL=Renames the default _LABEL_ column

βœ… Example 1: Transposing Without BY or ID

πŸ”Ή Input Data

data sales;
input Quarter $ Sales; datalines; Q1 100 Q2 120 Q3 140 Q4 160 ;

πŸ”Ή Transpose Code

proc transpose data=sales out=sales_transposed;
var Sales; run;

πŸ”Ή Output

NAMECOL1COL2COL3COL4
Sales100120140160

βœ… Example 2: Transposing with ID to Use Column Names

proc transpose data=sales out=sales_wide;
id Quarter; var Sales; run;

πŸ”Ή Output

NAMEQ1Q2Q3Q4
Sales100120140160

βœ… Example 3: Transpose with BY Grouping

πŸ”Ή Input Data

data student_scores;
input Student $ Subject $ Score; datalines; John Math 85 John English 78 John Science 92 Anna Math 88 Anna English 91 Anna Science 84 ;

πŸ”Ή Transpose Code

proc sort data=student_scores;
by Student; run; proc transpose data=student_scores out=scores_wide; by Student; id Subject; var Score; run;

πŸ”Ή Output

StudentEnglishMathScience
John788592
Anna918884

βœ… Example 4: Transposing Multiple Variables

data patient_data;
input ID $ Visit $ Height Weight; datalines; P1 Visit1 170 65 P1 Visit2 171 66 P2 Visit1 160 60 P2 Visit2 161 61 ;

πŸ”Ή Code

proc sort data=patient_data;
by ID; run; proc transpose data=patient_data out=trans_height prefix=Height_; by ID; id Visit; var Height; run; proc transpose data=patient_data out=trans_weight prefix=Weight_; by ID; id Visit; var Weight; run; data final_transposed; merge trans_height trans_weight; by ID; run;

πŸ”Ή Output

IDHeight_Visit1Height_Visit2Weight_Visit1Weight_Visit2
P11701716566
P21601616061

🧠 Tips for Using PROC TRANSPOSE

  • Always SORT your data before using BY.
  • Use the PREFIX= option to create meaningful column names.
  • Combine multiple transpositions for complex reshaping.
  • Use NAME= and LABEL= to rename the default variables _NAME_ and _LABEL_.


πŸ”Ž When to Use PROC TRANSPOSE

Use CasePROC TRANSPOSE?
Convert long to wide formatβœ… Yes
Convert wide to long formatβœ… Yes (reverse)
Reshape repeated measuresβœ… Yes
Change actual data values❌ No
Merge multiple reshaped tablesβœ… Yes

πŸ“ˆ Conclusion

PROC TRANSPOSE is an essential tool in any SAS programmer’s toolkit. It simplifies the process of reshaping data for reporting, analysis, and modeling. With a good understanding of BY, ID, and VAR options, you can handle almost any data transformation challenge.

Labels: , , , , , , , , , ,