Tuesday, December 29, 2020

SAS LIBRARY - HOW TO CREATE A LIBRARY | HOW TO DELETE A LIBRARY

What is a Library -

It is a collection of one or more SAS files that are recognized by SAS and stored as a unit. You reference a SAS library by a logical name called a libref.

At the beginning of each SAS session, SAS automatically creates at least one library that which is Work, Work is the temporary library, As soon as session terminated work library also get destroyed.

All SAS data sets have a two-level name that consists of the library name and the data set name.

When a data set is in the Work library, then it is not required to include the library name when you reference it.

Example -

Data EX; RUN;

Data Work.EX; RUN;

Both above datasets are same pointing to EX in work library.

Creating a Library

LIBNAME - 

SAS library can be created using LIBNAME statement. The LIBNAME statement associates the name of the library, or libref, with the physical location of the library. 

Libname - Datahark

Libname-Creating library


In SAS Studio, It is even more easier to create a Library without writing any code - You can create a new SAS library by opening the New Library window from the Libraries section of the navigation pane in SAS Studio.

SAS LIBRARY 2

SAS LIBRARY


Clearing a Library - 

you can delete a library reference by using the clear statement, the 

Syntax  - 

libname libref clear;

Example - 
LIBNAME Datahark Clear;

LIBNAME Datahark Clear;
LIBNAME Datahark Clear;

Clear Library

How to get list of sas datasets in a library?

This can be done with the help of PROC CONTENTS

Example - If we need to get the list of all dataset from SASHELP library -

proc contents data=SASHELP._all_ out=Libraries memtype=data noprint;
run;

Proc SQL;
select DISTINCT MEMNAME from Libraries;
quit;


This will Print the list of all all dataset from SASHELP library in result window.
 

Labels: , , ,

Monday, December 28, 2020

SCAN Function | SAS Character Functions

 SCAN Function returns the nth word from a character string.

Syntax - 

SCAN(string/Variable, count, Delimiter, Modifier);

String/Variable  - 

This is the source Value from where we want to extract the nth Word.

Count -

Non zero positive or negative numeric value. If count is positive, SCAN function start search from left to right. If count is negative, SCAN function start search from right to left.

Delimiter - 

It's optional argument, any Character value or Special Character can be used as a delimiter, if left blank then the default delimiters are ! $ % & ( ) * + , - . / ; < ^ ~ |

Modifier

Specifies a character constant, variable, or expression in which each non-blank character modifies the action of the SCAN function. Blanks are ignored. 

Example - 

DATA A;
Title_Name='Datahark - Every Data is a Story';
First_word=scan(Title_Name,1);
Second_word=scan(Title_Name,2);
a=scan(Title_Name,2,'-');
Last_word=scan(Title_Name,-1);
Second_Last_word=scan(Title_Name,-2);
b=scan(Title_Name,-2);
run;

Proc Print Data=a; 
run;


 

See the Value of each variable in output and understand the difference - 

  • First_word=scan(Title_Name,1);  -> This Extract the First string i.e. Datahark 
  • Second_word=scan(Title_Name,2); -> This Extract the second string i.e. Every
  • a=scan(Title_Name,2,'-'); -> We have specified delimiter as '-' so it considered ' Every Data is a Story' as a second value.
  • Last_word=scan(Title_Name,-1); -> Negative count Extracted last Value: Story
  • Second_Last_word=scan(Title_Name,-2); -> Negative count Extracted second last Value: a
  • b=scan(Title_Name,-2,'-'); -> Negative count and Delimiter is '-' so it considered ' Every Data is a Story' as first word and we have specified -2 in count so gives second last value as 'Datahark '

Subscribe to DataHark by Email

Labels: , , , ,

Sunday, December 27, 2020

Creating Dataset using Proc SQL - WHY ? | HOW ?

With PROC SQL we can not only report data in Result window but we can also create New dataset for the result we get out of Select Query

WHY? - This is needed sometime because we want to reutilize the result of Select Query in some other places for analysis, so we required to store result somewhere in dataset.

HOW? - To perform this task we just need to write Create Table <Library.dataset Name> as Before any Select Statement 

Syntax - 

PROC SQL;
CREATE TABLE <Library.dataset name> AS
SELECT variable(s)
FROM Table Name
<WHERE Conditional statement
GROUP BY Variable(s)
ORDER BY variable(s)>;
QUIT; 

See Example - 

/* Creating Dataset using proc SQL */
Proc SQL;
Create Table WORK.AUDI as Select * From SASHELP.CARS
where Make = 'Audi';
Run;

Creating Dataset using PROC SQL



 

Use of INOBS option - Fix the number of rows in newly created dataset.

If you want to limit the number of observation(rows) in dataset produced by Proc SQL then INOBS option can be used - 

See Example - 

/* Create a dataset Contains top 10 Cheapest Audi */
Proc SQL INOBS=10;
Create Table Audi as Select * From SASHELP.CARS
where Make = 'Audi'
ORDER BY MSRP;
Run;

Top 10 Cheapest Audi

 

Labels: , , , , ,

Saturday, December 26, 2020

PROC CONTENTS - SUMMARIZING DATASET CONTENTS | METADATA | SAS PROCS

PROC CONTENTS In SAS is used to summarize a dataset's contents and metadata, generated summary about the contents of a dataset includes :

  • All variable names, their types, and attributes of Variables (including formats, informats, and labels)
  • Display the Number of Observation in Dataset
  • Number of Variables in Dataset
  • When the dataset was created
  • And lot more Information about any Dataset

Syntax for PROC CONTENTS

PROC CONTENTS DATA=DatasetName <option(s)>;
<Optional Statements>
Run;

Example- 

PROC Contents Data=SASHELP.cars;
 title  'The Contents of the CARS Data Set';
run;

Proc Content


Available Options for PROC CONTENTS 


DIRECTORY

prints a list of all tables in the specified library. 


DETAILS | NODETAILS

Print additional Details of All Tables, This Details Option is always used along with Directory option. 

DefaultIf DETAILS and NODETAILS are not specified, the default value option considered as NODETAILS

        Example- 

PROC Contents Data=SASHELP.cars DIRECTORY DETAILS;
 title  'The Contents of the CARS Data Set';
run;

 

OUT=table-name

Store the Output in a specified Table/Dataset, OUT= does not suppress the printed output from the statement. If you want to suppress the printed output, you must use the NOPRINT option.


        Example- 

PROC Contents Data=SASHELP.cars  out=Cars_details;
 title  'The Contents of the CARS Data Set';
run;

 

Cars Details CONTENTs PROC
It Also Prints the PROC CONTENTS output in Result window.

NOPRINT

NOPRINT OPTION use to suppresses printing the output of the CONTENTS procedure.

SHORT

prints only the list of variable names for the table, If the list of variables is more than 32,767 characters, the list is truncated and a WARNING is written to the log. To get a complete list of the variables, request an alphabetical listing of the variables.

        Example- 

PROC Contents Data=SASHELP.cars  SHORT;
 title  'The Contents of the CARS Data Set';
run;

Sort Option in PROC CONTENTS


VARNUM

prints a list of the variable names in the order of their logical position in the table. The physical position of the variable in the table is engine-dependent.

        Example- 

PROC Contents Data=SASHELP.cars  VARNUM;
 title  'The Contents of the CARS Data Set';
run;

Varnum option in PROC CONTENTS

 

ORDER=COLLATE | CASECOLLATE | IGNORECASE | VARNUM

COLLATE

prints a list of variables in alphabetical order beginning with uppercase and then lowercase names.

CASECOLLATE

prints a list of variables in alphabetical order even if they include mixed-case names and numerics.

IGNORECASE

prints a list of variables in alphabetical order ignoring the case of the letters.

VARNUM

is the same as the VARNUM option.

NoteThe ORDER= option does not affect the order of the OUT= table.

Labels: , , , , , ,

Creating Multiple Datasets from Existing Dataset - Different Situations | DYNAMIC SPLIT | USE OF CALL EXECUTE FUNCTION

Creating Multiple Datasets in one go is a Challenge often faced by SAS users, but It’s a Very easy process in SAS. 

There could be Two situations - 

1. Number of Dataset are already known-

Example - 

Student Data we want to split on basis of Gender, so here we already know that we need to create Male and Female, Two New datasets

2. Number of Dataset are not known

Example - 

SASHELP.CARS has N Number of Make in it, we wants to create all datasets for all unique Value in Make Variable, eg - Acura dataset will have all rows for Acura, Audi dataset will have all rows for Audi and so on.

Here Number of Dataset are not known it is based on Unique Value available in Make Variable.

Let's see our First Type where Number of Dataset are already know - 


 /* Created a Student Dataset with Few records */
Data Student;
Input Name $ Age Gender $;
cards;
Alex 20 M
Sady 21 F
Amanda 22 F
John 21 M
Jane 19 F
Mike 18 M
Chris 22 M
Nayasa 20 F
;
run;

/* Split Dataset based on Gender using If Then Output statement */

Data Male Female;
set Student;
if Gender='M' then output Male;
else if Gender='F' then Output Female;
run;

/*Printed all Three Datasets */

Proc Print Data=Student;Run; 
Proc Print Data= Male; Run;  
Proc Print Data= Female; Run; 

Muliple Dataset

Now Let's see our second Type where Number of Dataset are Unknown-

/* First We get the All Unique Value of Make in a New Datset "Unique_Make" */
Proc Sort Data=SAShelp.Cars Nodupkey out=Unique_make (Keep=Make);
by Make;
run;
/*Create a Name Variable in Unique_Make */

Data Unique_make;
set Unique_make;
Name=Scan(Make,1,'-');
run;

/* We created this Name becuase Make has a Value 'Mercedes-Benz', we can't use Mercedes-Benz as Dataset Name,
So here we used SCAN Function to Avoid this kind of Situation otherwise it will error our program. */

/*Now this is Final and important Code where we acually Split SASHELP.CARS into Multiple Datasets based on MAKE */

Data _NULL_;
set unique_Make;
call execute('data ' !! compress(Name) !! '; set sashelp.cars; where Make = "' !! Make !! '"; run;');
run;
Multiple Dataset in one Go


/*Similarly we can use Call Execute Funtion to Print all Datasets in one Go */

Data _NULL_;
set unique_Make;
call execute('Proc Print Data= ' !! compress(Make) !! '; run;');
run; 

Use of Call Execute Function

 

Labels: , , ,

Understanding the Program Data Vector (PDV) in SAS

Introduction

When working with SAS, understanding how data is processed behind the scenes is crucial to writing efficient and accurate programs. One of the most important internal concepts is the Program Data Vector (PDV). It plays a central role in how SAS reads and constructs datasets, especially within the DATA step. In this post, we’ll explore what PDV is, how it works, and why it matters for your SAS programming skills.


What is the Program Data Vector (PDV)?

The Program Data Vector (PDV) is a temporary memory area created by SAS when a DATA step is compiled and executed. It is used to build each observation (row) of a SAS dataset one at a time.

Think of the PDV as a holding area where variable values are stored during the execution of the DATA step, just before they are written to the dataset.



Why is PDV Important?

Understanding PDV helps you:

  • Predict the order of variable creation and execution
  • Understand how missing values are assigned
  • Debug unexpected results in DATA steps
  • Write more efficient and accurate programs


How PDV Works

PDV

Let’s break down the process of how PDV operates in a DATA step:

1. Compilation Phase

  • SAS identifies all the variables to be created.
  • It builds the structure of the PDV including the order and length of variables.
  • Input and output datasets are determined, but no data is read yet.

2. Execution Phase

  • One observation is read into the PDV at a time.
  • Statements in the DATA step are executed.
  • After execution, the observation is written to the dataset.
  • The PDV is reset for the next observation (except for variables created with retain).


Example: PDV in Action

data example;
input name $ age; age_plus_5 = age + 5; datalines; John 25 Mary 30 ; run;

What happens in the PDV?

  1. Compilation phase:
    1. Variables identified: name, age, age_plus_5
    2. PDV structure: [name][age][age_plus_5]
  2. Execution phase:

    1. First line: John 25

      1. PDV becomes: name=John, age=25, age_plus_5=30
      2. Observation is written
    2. Second line: Mary 30
      1. PDV becomes: name=Mary, age=30, age_plus_5=35
After each observation, the PDV resets (all variables initialized to missing except for retained variables).


Special Note on retain and PDV

When you use the retain statement, it prevents PDV from resetting a variable to missing for each new iteration.

data retain_example;
retain total 0; input value; total + value; datalines; 5 10 ; run;

  • Here, total is initialized once and keeps accumulating, because it is retained in the PDV.


PDV and Automatic Variables

SAS also creates automatic variables in the PDV, such as:

  • _N_: Number of iterations
  • _ERROR_: Error flag (0 or 1)

These are not written to the final dataset but can be used for debugging or logic control.


Key Points to Remember

  • The PDV is created during the DATA step.
  • It stores values of all variables during the step.
  • Observations are written one at a time after execution.
  • Variables are reset to missing after each iteration unless retain is used.
  • Understanding PDV helps you avoid logical errors and write better SAS code.


Conclusion

The Program Data Vector (PDV) is a powerful concept in SAS that acts as the engine behind the DATA step. By understanding how PDV works, you can gain deeper insight into how your data is processed and improve your ability to debug and optimize SAS programs.

Whether you're preparing for a SAS interview or trying to enhance your programming skills, mastering the PDV is a crucial step in becoming a proficient SAS programmer.



Output dataset is the Final step where data is actually loaded in system after the PDV process completes.


Labels: , , ,

Mastering the INPUT and INFILE Statements in SAS: A Beginner's Guide with Examples

Introduction

If you're just starting with SAS programming, one of the most essential skills you'll need is the ability to read raw data files into SAS datasets. This is where the INPUT and INFILE statements come into play. These two statements are fundamental when you're working with external data sources like .txt, .csv, or .dat files.

In this post, we'll explore:

  • What the INFILE and INPUT statements do
  • Syntax and options
  • Step-by-step examples
  • Common errors and best practices

Whether you're preparing for the Base SAS Certification or learning SAS for data analytics, this guide will help you understand how to efficiently read data into SAS.


What is the INFILE Statement in SAS?

The INFILE statement is used to tell SAS where to find your external data file. It defines the file path and other important file-reading options such as delimiters and record length.

Syntax:

INFILE 'file-path' <options>;

Example:

INFILE 'C:\Data\students.txt' DLM=',' FIRSTOBS=2;

  • DLM=',' — Specifies the delimiter as a comma.
  • FIRSTOBS=2 — Tells SAS to start reading from the second row (often used to skip headers).


What is the INPUT Statement in SAS?

The INPUT statement tells SAS how to read the data — the structure, variable names, types, and formats. It works hand-in-hand with the INFILE statement to read external files into a SAS dataset.

Syntax:

INPUT var1 $ var2 var3;

  • Use $ for character variables.
  • No $ is needed for numeric variables.

Example:

INPUT Name $ Age Height;

Example: Reading a CSV File Using INFILE and INPUT

Let's say you have a file named students.csv with the following data:

Name,Age,Score
Alice,22,88 Bob,23,91 Charlie,21,85

SAS Code:

DATA student_data;
INFILE 'C:\Users\YourName\Documents\students.csv' DLM=',' FIRSTOBS=2; INPUT Name $ Age Score; RUN; PROC PRINT DATA=student_data; RUN;

More on Input and Infile Statement - 

1. 
Data <Dataset Name>
Infile 'Location\Filename' DLM=<delimiter> Firstobs=<Starting Position>;
Input <Variable Name and formats>;
run;

2. 
FileName <Name/Alias for File> 'Location/FileName';
Data <Dataset Name>
Infile above defined Name/Alias for File  DLM=<delimiter> Firstobs=<Starting Position>;;
Input <Variable Name and formats>;
run;

Here FileName is a Global Statement

For Example - I have a Text file on my location - /home/../New Folder/New Text Document.txt with Credit Card Number and Spend information, to import that
1.
Data datahark;
infile  '/home/../New Folder/New Text Document.txt' Firstobs=2;
INput cc Spend;
run;

2.
Filename CC '/home/../New Folder/New Text Document.txt';
Data datahark;
infile  cc Firstobs=2;
INput cc Spend;
run;

Both of above code will import the File and Give the same output - 

Use of Infile and Input Statement in SAS

Important Options in INFILE

OptionDescription
DLM=','Sets the delimiter (comma, space, tab, etc.)
FIRSTOBS=nReads from the nth record (skip headers)
DSDHandles missing values and quotes in CSV
MISSOVERPrevents reading the next line if data is missing
TRUNCOVERAvoids errors when fewer columns exist than expected

Reading Space-Delimited Data

DATA test_scores;
INFILE 'C:\Data\testdata.txt'; INPUT StudentID $ Subject $ Score; RUN;

This reads a space-separated file without needing DLM.


Common Mistakes to Avoid

  1. File Not Found: Always use the full path or set the correct working directory.
  2. Wrong Delimiter: Use the correct delimiter option in INFILE.
  3. Incorrect Data Types: Forgetting to use $ for character variables.
  4. Header Row Issues: Use FIRSTOBS=2 if your data has a header.


Best Practices

  • Use DSD and FIRSTOBS=2 for clean CSV handling.
  • Always validate data using PROC PRINT after loading.
  • Modularize your code: Keep data import in a separate step.
Click here to Read more »

Labels: , , , , , , , , , ,

Thursday, December 24, 2020

PROC SQL - More Basic Examples

Go to Previous page  - PROC SQL FOR BEGINNERS - HOW TO USE SQL IN SAS

More Examples of Basic PROC SQL Programs

7. Getting Specific Number of Records From Dataset - Use of OUTOBS option

proc sql outobs=10; 
select *  From SASHELP.CARS;
Quit;

proc sql example 7 Datahark

This Code is displaying First 10 records from SASHELP.CARS

8. Giving Label(New Display Name) to a Variable/Column in Dataset.

proc sql; 
select Make as Company, MSRP as Price From SASHELP.CARS 
where Make ='Audi';
QUIT;
Label in PROC SQL

This Code is changing Display name of Make to Company and MSRP to Price and Display all records for Audi

9. Creating New Variable in PROC SQL

Proc sql;
select Make, MSRP, (MSRP*0.10) AS DISCOUNT From SASHELP.CARS Where 
Make='Acura';
QUIT;
Variable in PROC SQL

IT will Add New Variable in Display List named as DISCOUNT which is having 10% of MSRP.

10. If we want to Use Newly Created Variable in Same Select Statement then we need to use CALCULATED Keyword

proc sql;
select Make, MSRP, (MSRP*0.10) AS DISCOUNT, (MSRP-CALCULATED DISCOUNT) as FINAL_PRICE 
From SASHELP.CARS 
WHERE Make='BMW';
QUIT;

Calculated Keyword in PROC SQL

11. Removing Duplicates Using DISTINCT Keyword

proc sql;
select DISTINCT Make
From SASHELP.CARS ;
QUIT;

Distinct in PROC SQL

12. Multiple SQL Statements in One PROC SQL

proc sql;
select DISTINCT Cylinders From SASHELP.CARS ;
select Make,Cylinders From SASHELP.CARS where Make IN ('BMW');
QUIT;
Multiple SQL in PROC SQL

13. Creating a New Dataset/Table using PROC SQL We can create New Dataset using CREATE TABLE statement in PROC SQL -

PROC SQL;
Create Table BMW as select Make,Cylinders From SASHELP.CARS where Make IN ('BMW');QUIT;

 

  •  Creating New Dataset in SAS using PROC SQL

Labels: , , , , ,

Wednesday, December 23, 2020

PROC SQL FOR BEGINNERS - HOW TO USE SQL IN SAS

PROC SQL SAS

SAS offers support to relational databases by using SQL queries within SAS Program using Proc SQL. Procedure PROC SQL is used to evaluate and run the SQL statements and also for creating New Dataset/table out of written query, Also it can create new Variables/Columns in Dataset(s).

SQL is one of the Top 5 Programming Languages for Data Analysis, The advantage of PROC SQL over SQL is - we can use SQL along with Base SAS programming logic which can help us to solve more complex data analysis problem very easily.

Lets see how we can utilize PROC SQL to write SQL in SAS - 

Syntax for Proc SQL  - 

PROC SQL;
SELECT Columns
FROM TABLE
<WHERE Condition
GROUP BY Columns
Having Condition
Order by Columns;>
QUIT;

Click here to Read more »

Labels: , , , , ,

Tuesday, December 22, 2020

SAS ETL: A Complete Guide to Extract, Transform, and Load in SAS

SAS ETL - We sometime heard this Term "ETL" from big companies, Job posted for SAS ETL Developer, so Lets see what is SAS ETL exactly.

"ETL" Means - Extract Transform and Load - Its a type of Data Integration tool which is used to blend data from different Sources.

In the world of data analytics, ETL (Extract, Transform, Load) plays a crucial role in preparing data for analysis and reporting. In this blog post, we will explore how ETL processes are implemented in SAS (Statistical Analysis System)—a powerful tool widely used in data management, analytics, and business intelligence.

Whether you're a beginner or a data professional, understanding SAS ETL is essential for managing and transforming large volumes of data efficiently.

SAS ETL

🔍 What is ETL?

ETL stands for:

  • Extract: Retrieving raw data from multiple sources (databases, Excel files, text files, web sources, etc.)
  • Transform: Cleaning, standardizing, and reshaping data to suit your business needs.
  • Load: Importing the processed data into a target system such as a data warehouse or analytical platform.

SAS provides robust tools and procedures to handle all three stages with precision and scalability.


📥 Extracting Data Using SAS

The data extraction phase involves connecting to different data sources using SAS libraries, PROC IMPORT, or LIBNAME statements. SAS can extract data from:

  • Excel and CSV files
  • SQL databases (Oracle, MySQL, SQL Server, etc.)
  • Cloud-based storage or Hadoop
  • Web sources via APIs

Example:

libname mydb odbc dsn='SalesDB' user='admin' password='xyz123';

This command connects SAS to a SQL database and allows direct access to tables for extraction.


🔄 Transforming Data in SAS

The transformation step is where the real magic happens. You can use SAS procedures, data step programming, and functions to clean and prepare data:

  • Filtering and sorting (PROC SORT, WHERE clauses)
  • Merging datasets (MERGE, PROC SQL JOIN)
  • Creating calculated fields
  • Handling missing values
  • Aggregating data using PROC MEANS, PROC SUMMARY, or PROC SQL

Example:

data sales_clean;
set sales_raw; if sales > 0; revenue = quantity * price; run;

📦 Loading Data in SAS

The final step is loading the transformed data into a target destination for reporting or visualization. In SAS, you can:

  • Save datasets using the DATA step
  • Export to Excel/CSV using PROC EXPORT
  • Load into relational databases via LIBNAME or PROC SQL

Example:

proc export data=sales_clean
outfile="C:\Reports\Cleaned_Sales.csv" dbms=csv replace; run;

Click here to Read more »

Labels: , , , , ,

Sunday, December 20, 2020

Understanding the SUBSTR Function in SAS: A Complete Guide

When working with character data in SAS, extracting parts of strings is a common task. Whether you're cleaning raw data or generating new variables, the SUBSTR function becomes an essential tool in your SAS programming toolbox.

In this blog post, we'll break down what the SUBSTR function does, how it works, and provide real-world examples to help you master its usage.

Substr function

🔍 What is the SUBSTR Function in SAS?

The SUBSTR function in SAS is used to extract a substring from a character variable or string. You can specify the starting position and the length of the substring you want to extract.

Syntax:

SUBSTR(string, start-position <, length>)

  • string: The character string or variable.
  • start-position: The starting position (1-based index).
  • length (optional): Number of characters to extract. If omitted, the substring continues to the end of the string.

✅ Key Features of SUBSTR

  • It is case-sensitive.
  • Can be used both on the left-hand side (LHS) and right-hand side (RHS) of assignment.
  • Useful for data cleaning, transformation, and feature engineering.

🧪 Examples of SUBSTR in Action

Example 1: Extracting a Substring from a Character Variable

data example1;
name = "JohnDoe"; first_name = substr(name, 1, 4); /* Extracts 'John' */ run;

Example 2: Using SUBSTR Without Length (Extract till End)

data example2;
id = "EMP12345"; emp_code = substr(id, 4); /* Extracts '12345' */ run;

Example 3: Using SUBSTR on the Left Side to Modify a String

data example3;
phone = "9876543210"; substr(phone, 1, 3) = "999"; /* Replaces first 3 characters */ run;

⚠️ Common Pitfalls

  • Position starts at 1, not 0 like in some other programming languages.
  • If the start-position exceeds the string length, SUBSTR returns a blank.
  • If you try to modify a variable using SUBSTR on LHS, ensure the variable has enough allocated length.

📌 Use Cases in Real-world SAS Programming

  • Extracting codes from structured IDs (e.g., EMP001, PROD2023)
  • Parsing CSV or fixed-width text fields
  • Replacing characters at specific positions
  • Creating derived variables for reports and models

💡 Tips for Using SUBSTR Effectively

  • Combine SUBSTR with INDEX, SCAN, or FIND for dynamic substring extraction.
  • Always use the LENGTH statement to define the expected length of output variables.
  • For numeric values, convert using PUT() before applying SUBSTR.

🧭 Conclusion

The SUBSTR function is a versatile tool in SAS that enables efficient string manipulation. Mastering it not only simplifies your data processing tasks but also enhances your ability to handle messy or semi-structured data with ease.

Labels: , , , ,

📥 PROC IMPORT in SAS: A Complete Guide to Importing External Data Files

One of the most common tasks in data analysis is importing external files into your SAS environment. Fortunately, SAS provides a powerful and flexible procedure called PROC IMPORT that allows you to bring in data from Excel, CSV, and text files with ease.

In this blog post, we'll explore everything you need to know about PROC IMPORT in SAS, including syntax, file types, options, practical examples, and visual aids to make your learning easier.


🔍 What is PROC IMPORT?

PROC IMPORT is a SAS procedure that allows you to read data from various file types—most commonly:
  • Microsoft Excel files (.xls, .xlsx)
  • Comma-separated values files (.csv)
  • Tab-delimited or plain text files (.txt)

The IMPORT procedure will, by default, get variable names from the first line in your data file. If you do not want this, then add the GETNAMES=NO statement after the PROC IMPORT statement. PROC IMPORT will assign the variables the names VAR1, VAR2, VAR3, and so on. Also if your data file is type DLM, PROC IMPORT assumes that the delimiter is a space. If you have a different delimiter, then specify it in the DELIMITER= statement. The following shows both these statements: 

Click here to Read more »

Labels: , , , , , , , , , , ,

Monday, December 14, 2020

🔄 PROC SORT in SAS: Sorting Data and Removing Duplicates Using NODUP and NODUPKEY Options

📌 What is PROC SORT in SAS?

PROC SORT is a procedure used to sort a SAS dataset by one or more variables. Sorting is essential for data preparation, reporting, and ensuring that other procedures work as expected.

✅ Basic Syntax:

proc sort data=input_dataset out=output_dataset;
by variable1 variable2; run;

Click here to Read more »

Labels: , , , ,