DataHark - Every Data is a Story: December 2020

Tuesday, December 29, 2020

SAS LIBRARY - HOW TO CREATE A LIBRARY | HOW TO DELETE A LIBRARY

What is a Library -

It is a collection of one or more SAS files that are recognized by SAS and stored as a unit. You reference a SAS library by a logical name called a libref.

At the beginning of each SAS session, SAS automatically creates at least one library that which is Work, Work is the temporary library, As soon as session terminated work library also get destroyed.

All SAS data sets have a two-level name that consists of the library name and the data set name.

When a data set is in the Work library, then it is not required to include the library name when you reference it.

Example -

Data EX; RUN;
Data Work.EX; RUN;

Both above datasets are same pointing to EX in work library.

Creating a Library

LIBNAME -

SAS library can be created using LIBNAME statement. The LIBNAME statement associates the name of the library, or libref, with the physical location of the library.

In SAS Studio, It is even more easier to create a Library without writing any code - You can create a new SAS library by opening the New Library window from the Libraries section of the navigation pane in SAS Studio.

Clearing a Library -

you can delete a library reference by using the clear statement, the

Syntax -

libname libref clear;

Example -

LIBNAME Datahark Clear;

LIBNAME Datahark Clear;

LIBNAME Datahark Clear;

How to get list of sas datasets in a library?

This can be done with the help of PROC CONTENTS

Example - If we need to get the list of all dataset from SASHELP library -

proc contents data=SASHELP._all_ out=Libraries memtype=data noprint;
run;

Proc SQL;
select DISTINCT MEMNAME from Libraries;
quit;

This will Print the list of all all dataset from SASHELP library in result window.

Labels: About SAS, Base SAS, Datahark, SAS

Monday, December 28, 2020

SCAN Function | SAS Character Functions

SCAN Function returns the nth word from a character string.

Syntax -

SCAN(string/Variable, count, Delimiter, Modifier);

String/Variable -

This is the source Value from where we want to extract the nth Word.

Count -

Non zero positive or negative numeric value. If count is positive, SCAN function start search from left to right. If count is negative, SCAN function start search from right to left.

Delimiter -

It's optional argument, any Character value or Special Character can be used as a delimiter, if left blank then the default delimiters are ! $ % & ( ) * + , - . / ; < ^ ~ |

Modifier

Specifies a character constant, variable, or expression in which each non-blank character modifies the action of the SCAN function. Blanks are ignored.

Example -

DATA A;
Title_Name='Datahark - Every Data is a Story';
First_word=scan(Title_Name,1);
Second_word=scan(Title_Name,2);
a=scan(Title_Name,2,'-');
Last_word=scan(Title_Name,-1);
Second_Last_word=scan(Title_Name,-2);
b=scan(Title_Name,-2);
run;

Proc Print Data=a;
run;

See the Value of each variable in output and understand the difference -

First_word=scan(Title_Name,1); -> This Extract the First string i.e. Datahark
Second_word=scan(Title_Name,2); -> This Extract the second string i.e. Every
a=scan(Title_Name,2,'-'); -> We have specified delimiter as '-' so it considered ' Every Data is a Story' as a second value.
Last_word=scan(Title_Name,-1); -> Negative count Extracted last Value: Story
Second_Last_word=scan(Title_Name,-2); -> Negative count Extracted second last Value: a
b=scan(Title_Name,-2,'-'); -> Negative count and Delimiter is '-' so it considered ' Every Data is a Story' as first word and we have specified -2 in count so gives second last value as 'Datahark '

Subscribe to DataHark by Email

Labels: About SAS, Datahark, Functions in SAS, SAS, sas functions

Sunday, December 27, 2020

Creating Dataset using Proc SQL - WHY ? | HOW ?

With PROC SQL we can not only report data in Result window but we can also create New dataset for the result we get out of Select Query

WHY? - This is needed sometime because we want to reutilize the result of Select Query in some other places for analysis, so we required to store result somewhere in dataset.

HOW? - To perform this task we just need to write Create Table <Library.dataset Name> as Before any Select Statement

Syntax -

PROC SQL;

CREATE TABLE <Library.dataset name> AS

SELECT variable(s)

FROM Table Name

<WHERE Conditional statement

GROUP BY Variable(s)

ORDER BY variable(s)>;

QUIT;

See Example -

/* Creating Dataset using proc SQL */
Proc SQL;
Create Table WORK.AUDI as Select * From SASHELP.CARS
where Make = 'Audi';
Run;

Use of INOBS option - Fix the number of rows in newly created dataset.

If you want to limit the number of observation(rows) in dataset produced by Proc SQL then INOBS option can be used -

See Example -

/* Create a dataset Contains top 10 Cheapest Audi */
Proc SQL INOBS=10;
Create Table Audi as Select * From SASHELP.CARS
where Make = 'Audi'
ORDER BY MSRP;
Run;

Labels: About SAS, Datahark, PROC SQL SAS, SAS, SAS SQL, SQL

Saturday, December 26, 2020

PROC CONTENTS - SUMMARIZING DATASET CONTENTS | METADATA | SAS PROCS

PROC CONTENTS In SAS is used to summarize a dataset's contents and metadata, generated summary about the contents of a dataset includes :

All variable names, their types, and attributes of Variables (including formats, informats, and labels)
Display the Number of Observation in Dataset
Number of Variables in Dataset
When the dataset was created
And lot more Information about any Dataset

Syntax for PROC CONTENTS

PROC CONTENTS DATA=DatasetName <option(s)>;

<Optional Statements>

Run;

Example-

PROC Contents Data=SASHELP.cars;
title 'The Contents of the CARS Data Set';
run;

Available Options for PROC CONTENTS

prints a list of all tables in the specified library.

DETAILS | NODETAILS

Print additional Details of All Tables, This Details Option is always used along with Directory option.

Default

If DETAILS and NODETAILS are not specified, the default value option considered as NODETAILS

Example-

PROC Contents Data=SASHELP.cars DIRECTORY DETAILS;
title 'The Contents of the CARS Data Set';
run;

OUT=table-name

Store the Output in a specified Table/Dataset, OUT= does not suppress the printed output from the statement. If you want to suppress the printed output, you must use the NOPRINT option.

Example-

PROC Contents Data=SASHELP.cars out=Cars_details;
title 'The Contents of the CARS Data Set';
run;

It Also Prints the PROC CONTENTS output in Result window.

NOPRINT

NOPRINT OPTION use to suppresses printing the output of the CONTENTS procedure.

SHORT

prints only the list of variable names for the table, If the list of variables is more than 32,767 characters, the list is truncated and a WARNING is written to the log. To get a complete list of the variables, request an alphabetical listing of the variables.

Example-

PROC Contents Data=SASHELP.cars SHORT;
title 'The Contents of the CARS Data Set';
run;

VARNUM

prints a list of the variable names in the order of their logical position in the table. The physical position of the variable in the table is engine-dependent.

Example-

PROC Contents Data=SASHELP.cars VARNUM;
title 'The Contents of the CARS Data Set';
run;

ORDER=COLLATE | CASECOLLATE | IGNORECASE | VARNUM

COLLATE

prints a list of variables in alphabetical order beginning with uppercase and then lowercase names.

CASECOLLATE

prints a list of variables in alphabetical order even if they include mixed-case names and numerics.

IGNORECASE

prints a list of variables in alphabetical order ignoring the case of the letters.

VARNUM

is the same as the VARNUM option.

Note	The ORDER= option does not affect the order of the OUT= table.

Labels: About SAS, Advance SAS, Base SAS, Data Analytics, Datahark, Procs, SAS

Creating Multiple Datasets from Existing Dataset - Different Situations | DYNAMIC SPLIT | USE OF CALL EXECUTE FUNCTION

Creating Multiple Datasets in one go is a Challenge often faced by SAS users, but It’s a Very easy process in SAS.

There could be Two situations -

1. Number of Dataset are already known-

Example -

Student Data we want to split on basis of Gender, so here we already know that we need to create Male and Female, Two New datasets

2. Number of Dataset are not known

Example -

SASHELP.CARS has N Number of Make in it, we wants to create all datasets for all unique Value in Make Variable, eg - Acura dataset will have all rows for Acura, Audi dataset will have all rows for Audi and so on.

Here Number of Dataset are not known it is based on Unique Value available in Make Variable.

Let's see our First Type where Number of Dataset are already know -

/* Created a Student Dataset with Few records */
Data Student;
Input Name $ Age Gender $;
cards;
Alex 20 M
Sady 21 F
Amanda 22 F
John 21 M
Jane 19 F
Mike 18 M
Chris 22 M
Nayasa 20 F
;
run;

/* Split Dataset based on Gender using If Then Output statement */

Data Male Female;
set Student;
if Gender='M' then output Male;
else if Gender='F' then Output Female;
run;

/*Printed all Three Datasets */

Proc Print Data=Student;Run;
Proc Print Data= Male; Run;
Proc Print Data= Female; Run;

Now Let's see our second Type where Number of Dataset are Unknown-

/* First We get the All Unique Value of Make in a New Datset "Unique_Make" */
Proc Sort Data=SAShelp.Cars Nodupkey out=Unique_make (Keep=Make);
by Make;
run;

/*Create a Name Variable in Unique_Make */

Data Unique_make;
set Unique_make;
Name=Scan(Make,1,'-');
run;

/* We created this Name becuase Make has a Value 'Mercedes-Benz', we can't use Mercedes-Benz as Dataset Name,
So here we used SCAN Function to Avoid this kind of Situation otherwise it will error our program. */

/*Now this is Final and important Code where we acually Split SASHELP.CARS into Multiple Datasets based on MAKE */

Data _NULL_;
set unique_Make;
call execute('data ' !! compress(Name) !! '; set sashelp.cars; where Make = "' !! Make !! '"; run;');
run;

/*Similarly we can use Call Execute Funtion to Print all Datasets in one Go */

Data _NULL_;
set unique_Make;
call execute('Proc Print Data= ' !! compress(Make) !! '; run;');
run;

Labels: About SAS, Base SAS, Datahark, SAS

Understanding the Program Data Vector (PDV) in SAS

Introduction

When working with SAS, understanding how data is processed behind the scenes is crucial to writing efficient and accurate programs. One of the most important internal concepts is the Program Data Vector (PDV). It plays a central role in how SAS reads and constructs datasets, especially within the DATA step. In this post, we’ll explore what PDV is, how it works, and why it matters for your SAS programming skills.

What is the Program Data Vector (PDV)?

The Program Data Vector (PDV) is a temporary memory area created by SAS when a DATA step is compiled and executed. It is used to build each observation (row) of a SAS dataset one at a time.

Think of the PDV as a holding area where variable values are stored during the execution of the DATA step, just before they are written to the dataset.

Why is PDV Important?

Understanding PDV helps you:

Predict the order of variable creation and execution
Understand how missing values are assigned
Debug unexpected results in DATA steps
Write more efficient and accurate programs

How PDV Works

Let’s break down the process of how PDV operates in a DATA step:

1. Compilation Phase

SAS identifies all the variables to be created.
It builds the structure of the PDV including the order and length of variables.
Input and output datasets are determined, but no data is read yet.

2. Execution Phase

One observation is read into the PDV at a time.
Statements in the DATA step are executed.
After execution, the observation is written to the dataset.
The PDV is reset for the next observation (except for variables created with retain).

Example: PDV in Action

data example;
    input name $ age;
    age_plus_5 = age + 5;
    datalines;
    John 25
    Mary 30
    ;
run;

What happens in the PDV?

Compilation phase:

Variables identified: name, age, age_plus_5
PDV structure: [name][age][age_plus_5]

Execution phase:

First line: John 25

PDV becomes: name=John, age=25, age_plus_5=30
Observation is written

Second line: Mary 30

PDV becomes: name=Mary, age=30, age_plus_5=35

After each observation, the PDV resets (all variables initialized to missing except for retained variables).

Special Note on `retain` and PDV

When you use the retain statement, it prevents PDV from resetting a variable to missing for each new iteration.

data retain_example;
    retain total 0;
    input value;
    total + value;
    datalines;
    5
    10
    ;
run;

Here, total is initialized once and keeps accumulating, because it is retained in the PDV.

PDV and Automatic Variables

SAS also creates automatic variables in the PDV, such as:

_N_: Number of iterations
_ERROR_: Error flag (0 or 1)

These are not written to the final dataset but can be used for debugging or logic control.

Key Points to Remember

The PDV is created during the DATA step.
It stores values of all variables during the step.
Observations are written one at a time after execution.
Variables are reset to missing after each iteration unless retain is used.
Understanding PDV helps you avoid logical errors and write better SAS code.

Conclusion

The Program Data Vector (PDV) is a powerful concept in SAS that acts as the engine behind the DATA step. By understanding how PDV works, you can gain deeper insight into how your data is processed and improve your ability to debug and optimize SAS programs.

Whether you're preparing for a SAS interview or trying to enhance your programming skills, mastering the PDV is a crucial step in becoming a proficient SAS programmer.

Output dataset is the Final step where data is actually loaded in system after the PDV process completes.

Labels: About SAS, Base SAS, Datahark, SAS

Mastering the INPUT and INFILE Statements in SAS: A Beginner's Guide with Examples

Introduction

If you're just starting with SAS programming, one of the most essential skills you'll need is the ability to read raw data files into SAS datasets. This is where the INPUT and INFILE statements come into play. These two statements are fundamental when you're working with external data sources like .txt, .csv, or .dat files.

In this post, we'll explore:

What the INFILE and INPUT statements do
Syntax and options
Step-by-step examples
Common errors and best practices

Whether you're preparing for the Base SAS Certification or learning SAS for data analytics, this guide will help you understand how to efficiently read data into SAS.

What is the INFILE Statement in SAS?

The INFILE statement is used to tell SAS where to find your external data file. It defines the file path and other important file-reading options such as delimiters and record length.

Syntax:

INFILE 'file-path' <options>;

Example:

INFILE 'C:\Data\students.txt' DLM=',' FIRSTOBS=2;

DLM=',' — Specifies the delimiter as a comma.
FIRSTOBS=2 — Tells SAS to start reading from the second row (often used to skip headers).

What is the INPUT Statement in SAS?

The INPUT statement tells SAS how to read the data — the structure, variable names, types, and formats. It works hand-in-hand with the INFILE statement to read external files into a SAS dataset.

Syntax:

INPUT var1 $ var2 var3;

Use $ for character variables.
No $ is needed for numeric variables.

Example:

INPUT Name $ Age Height;

Example: Reading a CSV File Using INFILE and INPUT

Let's say you have a file named students.csv with the following data:

Name,Age,Score
Alice,22,88
Bob,23,91
Charlie,21,85

SAS Code:

DATA student_data;
    INFILE 'C:\Users\YourName\Documents\students.csv' DLM=',' FIRSTOBS=2;
    INPUT Name $ Age Score;
RUN;

PROC PRINT DATA=student_data;
RUN;

More on Input and Infile Statement -

Data <Dataset Name>

Infile 'Location\Filename' DLM=<delimiter> Firstobs=<Starting Position>;

Input <Variable Name and formats>;

run;

FileName <Name/Alias for File> 'Location/FileName';

Data <Dataset Name>

Infile above defined Name/Alias for File DLM=<delimiter> Firstobs=<Starting Position>;;

Input <Variable Name and formats>;

run;

Here FileName is a Global Statement

For Example - I have a Text file on my location - /home/../New Folder/New Text Document.txt with Credit Card Number and Spend information, to import that

1.
Data datahark;
infile '/home/../New Folder/New Text Document.txt' Firstobs=2;
INput cc Spend;
run;

2.
Filename CC '/home/../New Folder/New Text Document.txt';
Data datahark;
infile cc Firstobs=2;
INput cc Spend;
run;

Both of above code will import the File and Give the same output -

Use of Infile and Input Statement in SAS

Important Options in INFILE

Option	Description
`DLM=','`	Sets the delimiter (comma, space, tab, etc.)
`FIRSTOBS=n`	Reads from the nth record (skip headers)
`DSD`	Handles missing values and quotes in CSV
`MISSOVER`	Prevents reading the next line if data is missing
`TRUNCOVER`	Avoids errors when fewer columns exist than expected

Reading Space-Delimited Data

DATA test_scores;
    INFILE 'C:\Data\testdata.txt';
    INPUT StudentID $ Subject $ Score;
RUN;

This reads a space-separated file without needing DLM.

Common Mistakes to Avoid

File Not Found: Always use the full path or set the correct working directory.
Wrong Delimiter: Use the correct delimiter option in INFILE.
Incorrect Data Types: Forgetting to use $ for character variables.
Header Row Issues: Use FIRSTOBS=2 if your data has a header.

Best Practices

Use DSD and FIRSTOBS=2 for clean CSV handling.
Always validate data using PROC PRINT after loading.
Modularize your code: Keep data import in a separate step.

Click here to Read more »

Labels: About SAS, Base SAS, Datahark, External Data in SAS, INFILE Statement in SAS, INPUT Statement in SAS, Reading Data in SAS, SAS, SAS CSV Import, SAS File Handling, SAS Programming

Thursday, December 24, 2020

PROC SQL - More Basic Examples

Go to Previous page - PROC SQL FOR BEGINNERS - HOW TO USE SQL IN SAS

More Examples of Basic PROC SQL Programs

7. Getting Specific Number of Records From Dataset - Use of OUTOBS option

proc sql outobs=10;
select * From SASHELP.CARS;
Quit;

This Code is displaying First 10 records from SASHELP.CARS

8. Giving Label(New Display Name) to a Variable/Column in Dataset.

proc sql;
select Make as Company, MSRP as Price From SASHELP.CARS
where Make ='Audi';
QUIT;

This Code is changing Display name of Make to Company and MSRP to Price and Display all records for Audi

9. Creating New Variable in PROC SQL

Proc sql;
select Make, MSRP, (MSRP*0.10) AS DISCOUNT From SASHELP.CARS Where
Make='Acura';
QUIT;

IT will Add New Variable in Display List named as DISCOUNT which is having 10% of MSRP.

10. If we want to Use Newly Created Variable in Same Select Statement then we need to use CALCULATED Keyword

proc sql;
select Make, MSRP, (MSRP*0.10) AS DISCOUNT, (MSRP-CALCULATED DISCOUNT) as FINAL_PRICE
From SASHELP.CARS
WHERE Make='BMW';
QUIT;

11. Removing Duplicates Using DISTINCT Keyword

proc sql;
select DISTINCT Make
From SASHELP.CARS ;
QUIT;

12. Multiple SQL Statements in One PROC SQL

proc sql;
select DISTINCT Cylinders From SASHELP.CARS ;
select Make,Cylinders From SASHELP.CARS where Make IN ('BMW');
QUIT;

13. Creating a New Dataset/Table using PROC SQL We can create New Dataset using CREATE TABLE statement in PROC SQL -

PROC SQL;
Create Table BMW as select Make,Cylinders From SASHELP.CARS where Make IN ('BMW');QUIT;

Labels: About SAS, Datahark, PROC SQL SAS, SAS, SAS SQL, SQL

Wednesday, December 23, 2020

PROC SQL FOR BEGINNERS - HOW TO USE SQL IN SAS

PROC SQL SAS

SAS offers support to relational databases by using SQL queries within SAS Program using Proc SQL. Procedure PROC SQL is used to evaluate and run the SQL statements and also for creating New Dataset/table out of written query, Also it can create new Variables/Columns in Dataset(s).

SQL is one of the Top 5 Programming Languages for Data Analysis, The advantage of PROC SQL over SQL is - we can use SQL along with Base SAS programming logic which can help us to solve more complex data analysis problem very easily.

Lets see how we can utilize PROC SQL to write SQL in SAS -

Syntax for Proc SQL -

PROC SQL;
SELECT Columns
FROM TABLE
<WHERE Condition
GROUP BY Columns
Having Condition
Order by Columns;>
QUIT;

Click here to Read more »

Labels: About SAS, Datahark, PROC SQL SAS, SAS, SAS SQL, SQL

Tuesday, December 22, 2020

SAS ETL: A Complete Guide to Extract, Transform, and Load in SAS

SAS ETL - We sometime heard this Term "ETL" from big companies, Job posted for SAS ETL Developer, so Lets see what is SAS ETL exactly.

"ETL" Means - Extract Transform and Load - Its a type of Data Integration tool which is used to blend data from different Sources.

In the world of data analytics, ETL (Extract, Transform, Load) plays a crucial role in preparing data for analysis and reporting. In this blog post, we will explore how ETL processes are implemented in SAS (Statistical Analysis System)—a powerful tool widely used in data management, analytics, and business intelligence.

Whether you're a beginner or a data professional, understanding SAS ETL is essential for managing and transforming large volumes of data efficiently.

🔍 What is ETL?

ETL stands for:

Extract: Retrieving raw data from multiple sources (databases, Excel files, text files, web sources, etc.)
Transform: Cleaning, standardizing, and reshaping data to suit your business needs.
Load: Importing the processed data into a target system such as a data warehouse or analytical platform.

SAS provides robust tools and procedures to handle all three stages with precision and scalability.

📥 Extracting Data Using SAS

The data extraction phase involves connecting to different data sources using SAS libraries, PROC IMPORT, or LIBNAME statements. SAS can extract data from:

Excel and CSV files
SQL databases (Oracle, MySQL, SQL Server, etc.)
Cloud-based storage or Hadoop
Web sources via APIs

Example:

libname mydb odbc dsn='SalesDB' user='admin' password='xyz123';

This command connects SAS to a SQL database and allows direct access to tables for extraction.

🔄 Transforming Data in SAS

The transformation step is where the real magic happens. You can use SAS procedures, data step programming, and functions to clean and prepare data:

Filtering and sorting (PROC SORT, WHERE clauses)
Merging datasets (MERGE, PROC SQL JOIN)
Creating calculated fields
Handling missing values
Aggregating data using PROC MEANS, PROC SUMMARY, or PROC SQL

Example:

data sales_clean;
  set sales_raw;
  if sales > 0;
  revenue = quantity * price;
run;

📦 Loading Data in SAS

The final step is loading the transformed data into a target destination for reporting or visualization. In SAS, you can:

Save datasets using the DATA step
Export to Excel/CSV using PROC EXPORT
Load into relational databases via LIBNAME or PROC SQL

Example:

proc export data=sales_clean
  outfile="C:\Reports\Cleaned_Sales.csv"
  dbms=csv
  replace;
run;

Click here to Read more »

Labels: About SAS, Base SAS, Datahark, SAS, SAS ETL, sas functions

Sunday, December 20, 2020

Understanding the SUBSTR Function in SAS: A Complete Guide

When working with character data in SAS, extracting parts of strings is a common task. Whether you're cleaning raw data or generating new variables, the SUBSTR function becomes an essential tool in your SAS programming toolbox.

In this blog post, we'll break down what the SUBSTR function does, how it works, and provide real-world examples to help you master its usage.

🔍 What is the SUBSTR Function in SAS?

The SUBSTR function in SAS is used to extract a substring from a character variable or string. You can specify the starting position and the length of the substring you want to extract.

Syntax:

SUBSTR(string, start-position <, length>)

string: The character string or variable.
start-position: The starting position (1-based index).
length (optional): Number of characters to extract. If omitted, the substring continues to the end of the string.

✅ Key Features of SUBSTR

It is case-sensitive.
Can be used both on the left-hand side (LHS) and right-hand side (RHS) of assignment.
Useful for data cleaning, transformation, and feature engineering.

🧪 Examples of SUBSTR in Action

Example 1: Extracting a Substring from a Character Variable

data example1;
   name = "JohnDoe";
   first_name = substr(name, 1, 4); /* Extracts 'John' */
run;

Example 2: Using SUBSTR Without Length (Extract till End)

data example2;
   id = "EMP12345";
   emp_code = substr(id, 4); /* Extracts '12345' */
run;

Example 3: Using SUBSTR on the Left Side to Modify a String

data example3;
   phone = "9876543210";
   substr(phone, 1, 3) = "999"; /* Replaces first 3 characters */
run;

⚠️ Common Pitfalls

Position starts at 1, not 0 like in some other programming languages.
If the start-position exceeds the string length, SUBSTR returns a blank.
If you try to modify a variable using SUBSTR on LHS, ensure the variable has enough allocated length.

📌 Use Cases in Real-world SAS Programming

Extracting codes from structured IDs (e.g., EMP001, PROD2023)
Parsing CSV or fixed-width text fields
Replacing characters at specific positions
Creating derived variables for reports and models

💡 Tips for Using SUBSTR Effectively

Combine SUBSTR with INDEX, SCAN, or FIND for dynamic substring extraction.
Always use the LENGTH statement to define the expected length of output variables.
For numeric values, convert using PUT() before applying SUBSTR.

🧭 Conclusion

The SUBSTR function is a versatile tool in SAS that enables efficient string manipulation. Mastering it not only simplifies your data processing tasks but also enhances your ability to handle messy or semi-structured data with ease.

Labels: About SAS, Datahark, Functions in SAS, SAS, sas functions

📥 PROC IMPORT in SAS: A Complete Guide to Importing External Data Files

One of the most common tasks in data analysis is importing external files into your SAS environment. Fortunately, SAS provides a powerful and flexible procedure called PROC IMPORT that allows you to bring in data from Excel, CSV, and text files with ease.

In this blog post, we'll explore everything you need to know about PROC IMPORT in SAS, including syntax, file types, options, practical examples, and visual aids to make your learning easier.

🔍 What is `PROC IMPORT`?

PROC IMPORT is a SAS procedure that allows you to read data from various file types—most commonly:
Microsoft Excel files (.xls, .xlsx)
Comma-separated values files (.csv)
Tab-delimited or plain text files (.txt)

The IMPORT procedure will, by default, get variable names from the first line in your data file. If you do not want this, then add the GETNAMES=NO statement after the PROC IMPORT statement. PROC IMPORT will assign the variables the names VAR1, VAR2, VAR3, and so on. Also if your data file is type DLM, PROC IMPORT assumes that the delimiter is a space. If you have a different delimiter, then specify it in the DELIMITER= statement. The following shows both these statements:

Click here to Read more »

Labels: About SAS, Base SAS, Data Preparation, Datahark, Excel in SAS, Importing Data, IMPORTING Data in SAS, Proc Import, Procs, SAS, SAS CSV, SAS for Beginners

Monday, December 14, 2020

🔄 PROC SORT in SAS: Sorting Data and Removing Duplicates Using NODUP and NODUPKEY Options

📌 What is PROC SORT in SAS?

PROC SORT is a procedure used to sort a SAS dataset by one or more variables. Sorting is essential for data preparation, reporting, and ensuring that other procedures work as expected.

✅ Basic Syntax:

proc sort data=input_dataset out=output_dataset;
  by variable1 variable2;
run;

Click here to Read more »

Labels: About SAS, Datahark, Proc Sort, Procs, SAS

DataHark - Every Data is a Story

Tuesday, December 29, 2020

SAS LIBRARY - HOW TO CREATE A LIBRARY | HOW TO DELETE A LIBRARY

What is a Library -

Creating a Library

LIBNAME -

Clearing a Library -

How to get list of sas datasets in a library?

Monday, December 28, 2020

SCAN Function | SAS Character Functions

Syntax -

See the Value of each variable in output and understand the difference -

Sunday, December 27, 2020

Creating Dataset using Proc SQL - WHY ? | HOW ?

Syntax -

See Example -

Use of INOBS option - Fix the number of rows in newly created dataset.

See Example -

Saturday, December 26, 2020

PROC CONTENTS - SUMMARIZING DATASET CONTENTS | METADATA | SAS PROCS

Syntax for PROC CONTENTS

Example-

Available Options for PROC CONTENTS

DIRECTORY

prints a list of all tables in the specified library.

DETAILS | NODETAILS

Example-

PROC Contents Data=SASHELP.cars DIRECTORY DETAILS; title 'The Contents of the CARS Data Set';run;

OUT=table-name

Store the Output in a specified Table/Dataset, OUT= does not suppress the printed output from the statement. If you want to suppress the printed output, you must use the NOPRINT option.

Example-

PROC Contents Data=SASHELP.cars out=Cars_details; title 'The Contents of the CARS Data Set';run; It Also Prints the PROC CONTENTS output in Result window.

NOPRINT

SHORT

prints only the list of variable names for the table, If the list of variables is more than 32,767 characters, the list is truncated and a WARNING is written to the log. To get a complete list of the variables, request an alphabetical listing of the variables.

Example-

PROC Contents Data=SASHELP.cars SHORT; title 'The Contents of the CARS Data Set';run;

VARNUM

prints a list of the variable names in the order of their logical position in the table. The physical position of the variable in the table is engine-dependent.

Example-

PROC Contents Data=SASHELP.cars VARNUM; title 'The Contents of the CARS Data Set';run;

ORDER=COLLATE | CASECOLLATE | IGNORECASE | VARNUM

COLLATE

CASECOLLATE

IGNORECASE

VARNUM

Creating Multiple Datasets from Existing Dataset - Different Situations | DYNAMIC SPLIT | USE OF CALL EXECUTE FUNCTION

1. Number of Dataset are already known-

2. Number of Dataset are not known

Example -

Now Let's see our second Type where Number of Dataset are Unknown-

Understanding the Program Data Vector (PDV) in SAS

What is the Program Data Vector (PDV)?

Why is PDV Important?

How PDV Works

1. Compilation Phase

2. Execution Phase

Example: PDV in Action

What happens in the PDV?

Special Note on retain and PDV

PDV and Automatic Variables

Key Points to Remember

Conclusion

Mastering the INPUT and INFILE Statements in SAS: A Beginner's Guide with Examples

Introduction

What is the INFILE Statement in SAS?

What is the INPUT Statement in SAS?

Example: Reading a CSV File Using INFILE and INPUT

Important Options in INFILE

Reading Space-Delimited Data

Common Mistakes to Avoid

Best Practices

Thursday, December 24, 2020

PROC SQL - More Basic Examples

Go to Previous page - PROC SQL FOR BEGINNERS - HOW TO USE SQL IN SAS

More Examples of Basic PROC SQL Programs

7. Getting Specific Number of Records From Dataset - Use of OUTOBS option

This Code is displaying First 10 records from SASHELP.CARS

8. Giving Label(New Display Name) to a Variable/Column in Dataset.

This Code is changing Display name of Make to Company and MSRP to Price and Display all records for Audi

PROC Contents Data=SASHELP.cars DIRECTORY DETAILS;
title 'The Contents of the CARS Data Set';
run;

PROC Contents Data=SASHELP.cars out=Cars_details;
title 'The Contents of the CARS Data Set';
run;

It Also Prints the PROC CONTENTS output in Result window.

PROC Contents Data=SASHELP.cars SHORT;
title 'The Contents of the CARS Data Set';
run;

PROC Contents Data=SASHELP.cars VARNUM;
title 'The Contents of the CARS Data Set';
run;

Special Note on `retain` and PDV

🔍 What is `PROC IMPORT`?