Monday, March 30, 2020

📊Subsetting in SAS: Vertical and Horizontal Techniques Using IF, WHERE, KEEP, and DROP

When working with large datasets in SAS, it's often necessary to reduce the data to just what you need. This is where subsetting comes in. SAS allows you to subset data both vertically (columns) and horizontally (rows) using simple but powerful statements like IF, WHERE, KEEP, and DROP.

In this blog post, we'll explore how to perform vertical and horizontal subsetting in SAS, with practical examples to help you understand how and when to use each method.

🔍 What is Subsetting in SAS?

Subsetting means selecting a portion of your dataset. It can be of two types:

  • Horizontal Subsetting: Selecting specific rows (observations).
  • Vertical Subsetting: Selecting specific columns (variables).

By using subsetting, you can improve program efficiency and focus your analysis on relevant data.

📌 Horizontal Subsetting in SAS (Rows)

Horizontal subsetting is used when you want to filter rows based on conditions. SAS provides two main tools for this:

1. Using the IF Statement

The IF statement is placed inside a DATA step and processes after the data is read into the program.

data work.high_sales;
set work.sales; if amount > 1000; run;

Use Case: When you need to subset after reading the dataset.


2. Using the WHERE Statement

The WHERE statement can be used in both DATA steps and PROC steps, and it filters before the data is read.

data work.high_sales;
set work.sales; where amount > 1000; run;

Use Case: More efficient for large datasets, especially when reading from external sources.

🧠 Key Difference: IF vs WHERE

FeatureIF StatementWHERE Statement
LocationDATA step onlyDATA & PROC steps
EvaluationAfter reading dataBefore reading data
Expression TypeCan use any SAS expressionLimited to dataset variables

📊 Vertical Subsetting in SAS (Columns)

Vertical subsetting is used when you want to select or exclude variables (columns). This is done using the KEEP and DROP statements or options.

1. Using KEEP

The KEEP statement or option specifies which variables to retain in the output dataset.

data work.customer_subset;
set work.customers(keep=customer_id name email); run;

Or:

data work.customer_subset;
set work.customers; keep customer_id name email; run;

2. Using DROP

The DROP statement or option specifies which variables to exclude from the output dataset.

data work.customer_subset;
set work.customers(drop=address phone); run;

Or:

data work.customer_subset;
set work.customers; drop address phone; run;

🧠 Tip: Use KEEP and DROP as Options for Efficiency

Using keep= or drop= as dataset options (inside parentheses after set) can enhance performance by reading only the necessary columns.


🔄 Combining IF, WHERE, KEEP, and DROP

You can combine these statements for efficient, focused data processing:

data work.filtered_customers;
set work.customers(keep=customer_id age gender); where age >= 30; run;

Or:

data work.filtered_customers;
set work.customers; if age >= 30; drop address phone; run;

✅ Best Practices for Subsetting in SAS

  • Use WHERE for faster filtering during data input.
  • Use KEEP/DROP as dataset options to limit variables early.
  • Combine vertical and horizontal subsetting to optimize performance.
  • Avoid unnecessary variables in your output dataset to reduce file size and memory usage.


🧾 Conclusion

Subsetting data in SAS using IF, WHERE, KEEP, and DROP is a fundamental skill for any data professional. By mastering both horizontal (row) and vertical (column) subsetting techniques, you can write cleaner, faster, and more efficient SAS programs.

Whether you're working with millions of records or just prepping data for analysis, these tools give you the control to get exactly the data you need.

    Labels: , , , ,

    日 Creating Variables/Columns in a SAS Dataset

    In the world of data manipulation and statistical programming, SAS (Statistical Analysis System) is a powerhouse. One of the foundational skills when working with SAS datasets is the ability to create new variables or columns. Whether you're preparing data for analysis or creating flags and indicators, understanding how to add new variables efficiently is crucial.

    In this post, we’ll walk through several ways to create new variables in a SAS dataset using the DATA step and some handy functions.


    Why Create New Variables?

    Creating new variables allows you to:

    • Transform raw data into usable metrics (e.g., calculate age from a date of birth).
    • Create indicators or flags for filtering or analysis.
    • Aggregate or categorize continuous data.
    • Simplify complex logic into a readable format.


    Method 1: Using the DATA Step

    The most common way to create a new variable is within a DATA step. Here’s a simple example:

    data work.sales;
    set work.orders;
    total_price = unit_price * quantity;
    discount_flag = (discount > 0);
    run;

    Explanation:

    • total_price is a new numeric variable calculated by multiplying unit_price and quantity.
    • discount_flag is a new variable that will be 1 if there is a discount and 0 otherwise.


    Method 2: Conditional Logic with IF-THEN/ELSE

    Sometimes you need more control over how a variable is created:

    data work.sales;
    set work.orders;
    if quantity > 10 then bulk_order = 'Yes';
    else bulk_order = 'No';
    run;

    Note:

    • bulk_order is a character variable. In SAS, always ensure character variables are assigned character values (quoted strings), or define their length beforehand using a LENGTH statement.

    Length bulk_order $3;

    Method 3: Using SAS Functions

    SAS has a wide array of built-in functions for working with dates, strings, math, and more.

    data work.customer_info;
    set work.customers;
    age = int((today() - birth_date) / 365.25);
    full_name = catx(' ', first_name, last_name);
    run;

    Functions used:

    • today() returns the current date.
    • catx() concatenates strings with a delimiter.


    Method 4: Creating Dummy Variables or Indicators

    For modeling or filtering, you often need binary variables:

    data work.customers;
    set work.customers;
    is_senior = (age >= 65);
    run;

    This creates a variable is_senior that is 1 if age is 65 or above, and 0 otherwise.


    Best Practices

    • Name wisely: Use meaningful, concise variable names.
    • Use LENGTH for character variables: Define the length explicitly to prevent truncation or memory waste.
    • Check your data: Use PROC PRINT or PROC CONTENTS to verify your new columns.

    proc contents data=work.sales; run;
    proc print data=work.sales (obs=10); run;

    Conclusion

    Creating new variables in SAS is straightforward once you understand the DATA step and basic functions. Whether you're computing totals, creating indicators, or manipulating text, SAS provides powerful tools to transform and enrich your data.

    Mastering this foundational skill will significantly improve your data preparation and analysis capabilities in SAS.

      Labels: , , , ,

      闈 Creating Datasets in SAS: A Beginner’s Guide

      Creating datasets(Tables) is the foundational step in SAS programming journey. Whether you are working on analytics, reporting, or data visualization, a strong understanding of how to create and manage datasets in SAS is crucial. This blog post will walk you through various methods to create datasets in SAS, along with examples and best practices.


      🔍 What is a SAS Dataset?

      A SAS dataset is a table consisting of rows (observations) and columns (variables). It’s the standard format in which SAS stores and processes data.

      Each SAS dataset contains:

      • Descriptor portion: Metadata (e.g., variable names, types, labels)
      • Data portion: Actual data values stored in rows and columns


      📌 Methods to Create Datasets in SAS

      There are multiple ways to create datasets in SAS. Let’s explore the most common methods:


      1️⃣ Using the DATA Step

      The DATA step is the most common and powerful method to create datasets manually.

      🔹 Syntax:

      DATA dataset_name;
      INPUT var1 $ var2 var3; DATALINES; data_line1 data_line2 ; RUN;

      ✅ Example:

      DATA students;
      INPUT Name $ Age Marks; DATALINES; John 16 85 Sara 17 90 Ali 15 88 ; RUN;

      This code creates a dataset named students with three variables: Name, Age, and Marks.


      2️⃣ Using the SET Statement

      The SET statement is used to create a new dataset from an existing one.

      ✅ Example:

      DATA top_students;
      SET students; IF Marks > 85; RUN;

      This code filters students with marks greater than 85 and stores them in a new dataset called top_students.


      3️⃣ Importing External Files (CSV, Excel)

      SAS can import data from external files using procedures like PROC IMPORT.

      ✅ Import CSV Example:

      PROC IMPORT DATAFILE="C:\data\students.csv"
      OUT=students DBMS=CSV REPLACE; GETNAMES=YES; RUN;

      ✅ Import Excel Example:

      PROC IMPORT DATAFILE="C:\data\students.xlsx"
      OUT=students DBMS=XLSX REPLACE; SHEET="Sheet1"; GETNAMES=YES; RUN;

      4️⃣ Using INFILE and INPUT for Raw Data

      For reading raw data from external text files:

      ✅ Example:

      DATA employees;
      INFILE "C:\data\employees.txt"; INPUT ID Name $ Salary; RUN;

      5️⃣ Creating Temporary vs. Permanent Datasets

      • Temporary Dataset: Stored in WORK library, deleted after session ends
      • Permanent Dataset: Stored in a defined library and persists beyond sessions

      ✅ Temporary Dataset:

      DATA work.sales;
      INPUT Product $ Quantity; DATALINES; Laptop 5 Mouse 15 ; RUN;

      ✅ Permanent Dataset:

      LIBNAME mydata 'C:\sasfiles';
      DATA mydata.sales; INPUT Product $ Quantity; DATALINES; Laptop 5 Mouse 15 ; RUN;

      🧠 Best Practices for Dataset Creation in SAS

      • Use meaningful and consistent variable names
      • Label datasets and variables for clarity
      • Validate data using PROC PRINT, PROC CONTENTS, and PROC MEANS
      • Use formats to enhance readability (e.g., date, currency)


      🧪 Verify Your Dataset

      ✅ View Contents:

      PROC CONTENTS DATA=students;
      RUN;

      ✅ Print Records:

      PROC PRINT DATA=students;
      RUN;

      🔚 Conclusion

      Mastering dataset creation in SAS is essential for every SAS programmer. Whether you're reading external files, filtering data, or creating datasets manually, the techniques discussed here will help you manage your data efficiently. Once your datasets are ready, you can proceed with analytics, visualizations, or reporting using other powerful SAS procedures.

      Labels: , , , , ,

      🧱 Building Blocks in SAS: A Beginner's Guide to SAS Programming Structure

      If you're just getting started with SAS programming, understanding the building blocks of SAS is the key to writing powerful and efficient code.

      This guide will break down the fundamental components of SAS so you can confidently structure and run your first programs. These building blocks form the foundation of all SAS data manipulation, reporting, and analysis tasks.


      🚀 What is SAS?

      SAS (Statistical Analysis System) is a software suite used for advanced analytics, data management, and business intelligence. It's widely used in industries such as healthcare, banking, and pharmaceuticals.


      🔧 Core Building Blocks of a SAS Program

      A SAS program is typically made up of the following major components:


      1️⃣ DATA Step

      The DATA step is used to create, read, and modify SAS datasets. It’s the most fundamental block where data preparation and manipulation happen.

      ✅ Syntax:

      data output_dataset;
      set input_dataset; /* Data transformation logic */ run;

      🔍 Use Cases:

      • Reading data
      • Creating new variables
      • Filtering observations
      • Applying conditional logic


      2️⃣ PROC Step

      The PROC (Procedure) step calls SAS procedures to analyze and process data. SAS has procedures for sorting, printing, summarizing, plotting, and modeling data.

      ✅ Syntax:

      proc print data=dataset_name;
      run;

      🔍 Common PROCs:

      • PROC PRINT: Displays data
      • PROC SORT: Sorts data
      • PROC MEANS: Summarizes statistics
      • PROC FREQ: Frequency tables
      • PROC SQL: SQL-like data manipulation


      3️⃣ Statements

      Statements are commands that perform specific tasks within a DATA or PROC step.

      🔹 Examples:

      • SET: Reads data from a dataset
      • IF...THEN...ELSE: Conditional logic
      • KEEP/DROP: Include or exclude variables
      • INPUT and INFILE: Read raw external data
      • FORMAT and INFORMAT: Control data display and input

      🧠 Example:

      data age_flag;
      set sashelp.class; if age >= 13 then flag = 'Teen'; else flag = 'Child'; run;

      4️⃣ Functions

      SAS provides built-in functions for mathematical, character, date/time, and statistical operations.

      🔹 Common Functions:

      • SUM(), MEAN(): Math
      • SUBSTR(), UPCASE(): Character
      • TODAY(), INTNX(): Date/Time

      🧠 Example:

      data new;
      set old; full_name = catx(' ', first_name, last_name); run;

      5️⃣ Formats and Informats

      • Formats control how values are displayed.
      • Informats tell SAS how to read raw data.

      🧠 Example:

      data dates;
      input dob mmddyy10.; format dob date9.; datalines; 01/01/1990 ; run;

      6️⃣ Comments

      Comments help explain code. They’re ignored during execution but essential for documentation.

      📝 Syntax:

      /* This is a comment */
      * This is also a comment; /* Semicolon is required */

      🧩 Structure of a Complete SAS Program

      Here’s how a simple SAS program typically looks:

      /* Step 1: Create dataset */
      data work.sales; input name $ sales; datalines; John 200 Jane 300 ; run; /* Step 2: Print dataset */ proc print data=work.sales; run;

      📁 Optional Building Blocks

      These aren't mandatory in every program but are helpful in organizing larger SAS projects:

      🔹 LIBNAME Statement

      Defines a library (a folder or path where datasets are stored).

      libname mylib 'C:\SASData';

      🔹 Macro Language

      For dynamic, reusable code. Example:

      %let year = 2025;
      proc print data=sales&year.; run;

      🧠 Tips for Beginners

      • Always end statements with a semicolon (;)
      • Use RUN; or QUIT; to execute a block
      • Check the Log Window for errors
      • Start small, and gradually learn PROCs and functions


      ✅ Summary

      Building BlockPurpose
      DATA StepCreate and manipulate datasets
      PROC StepPerform analysis using procedures
      StatementsControl the flow and logic of your code
      FunctionsPerform calculations or transformations
      FormatsDisplay data in a readable format
      CommentsExplain code for future reference
      LIBNAME/MacrosOrganize and automate SAS projects

      📌 Final Thoughts

      Understanding the building blocks of SAS sets you up for success in analytics, reporting, and automation. Whether you're a data analyst, clinical programmer, or preparing for SAS certification, these fundamentals will stay with you throughout your SAS journey.

      Labels: , , , ,

      📘 Commonly Used Terms in SAS: A Beginner-Friendly Glossary

      Whether you're just starting with SAS programming or looking to refresh your concepts, understanding the core terminology in SAS is essential. This post covers commonly used SAS terms you'll frequently encounter while working with data in SAS

      SAS - By Datahark

      📚 Why Learn SAS Terminology?

      SAS (Statistical Analysis System) is a powerful tool used for data management, analytics, and reporting. If you want to write effective SAS code or interpret results correctly, learning the language of SAS is the first step.


      🧠 25 Common SAS Terms You Must Know

      1. Dataset

      A structured table containing rows (observations) and columns (variables). It's the primary data format in SAS, stored as .sas7bdat.


      2. Variable

      A column in a dataset representing a specific type of data (e.g., age, name, salary). Also called a field.


      3. Observation

      A single row of data in a dataset, representing one record or case (e.g., a customer).


      4. Libref

      A nickname or alias assigned to a SAS library using the LIBNAME statement. It points to a folder containing datasets.

      libname mylib 'C:\SASData';

      5. Format

      Determines how data is displayed (e.g., numeric as currency, dates in DDMMYY format).

      format salary dollar8.;

      6. Informat

      Defines how SAS reads and interprets raw data during input (e.g., reading a date string as a SAS date).

      informat birthdate mmddyy10.;

      7. Missing Value

      A value that’s not available for a variable. In SAS, missing numeric values are represented by a dot (.), and character values as a blank space.


      8. PROC

      Short for Procedure. SAS PROCs are pre-built routines that perform analysis or operations like sorting, summarizing, or modeling.

      proc print data=sashelp.class;
      run;

      9. DATA Step

      The building block of SAS programs used to read, modify, or create datasets.

      data new_data;
      set old_data; run;

      10. SAS Program

      A file containing a series of DATA and PROC steps. The standard file extension is .sas.


      11. SAS Log

      Displays messages from SAS when a program runs—includes notes, errors, and warnings. Always check the log to debug issues.


      12. SAS Output

      The result of your program (tables, reports, statistics), which can be viewed in the Output window or exported.


      13. Engine

      A part of SAS that reads or writes to a data format. Each data type (SAS dataset, Excel, etc.) uses an appropriate engine.


      14. Library

      A collection of one or more SAS files or datasets stored in a directory.


      15. View

      A virtual dataset that contains instructions for deriving data from other sources but doesn’t store actual data.


      16. Descriptor Portion

      Metadata about a dataset—includes variable names, types, lengths, and labels.


      17. Data Portion

      The actual content (values) stored in the dataset’s rows and columns.


      18. SQL (PROC SQL)

      Allows use of Structured Query Language within SAS to query and manage data just like in traditional databases.

      proc sql;
      select name, age from sashelp.class; quit;

      19. Join / Merge

      Combining datasets based on a common key. Use PROC SQL for joins or DATA step for merges.


      20. Label

      A descriptive name attached to a variable for more readable output.

      label age = 'Age of Student';

      21. Length

      Specifies the storage length of a variable, especially important for character variables.

      length city $20;

      22. Retain Statement

      Used in a DATA step to hold a value across iterations (rows).

      retain total 0;

      23. Keep / Drop

      Used to include or exclude variables during data processing.

      data new;
      set old(keep=name age); run;

      24. IF-THEN Statement

      Used to conditionally process data in a DATA step.

      if age >= 18 then status = 'Adult';

      25. SAS Function

      Built-in operations that return a value (e.g., sum(), mean(), substr(), today()).

      total = sum(a, b);

      Labels: , , ,

      Saturday, March 28, 2020

      About SAS – Everything You Need to Know About SAS Programming

      Keywords: SAS programming, what is SAS, SAS software, SAS analytics, SAS language, SAS for data analysis, SAS tools, SAS career, Base SAS, Advanced SAS, SAS tutorial, benefits of SAS
      About SAS

      🔍 What is SAS?

      SAS (Statistical Analysis System) is a powerful statistical software suite developed by SAS Institute for advanced analytics, data management, predictive analytics, business intelligence, and data visualization. Widely used in healthcare, finance, insurance, pharmaceuticals, and education, SAS is known for its robust performance, user-friendly syntax, and reliable statistical functions.


      📈 Why is SAS Important in Data Analytics?

      SAS is a cornerstone in the world of data analytics and business intelligence. It is trusted by Fortune 500 companies, government organizations, and research institutions for its accuracy, scalability, and versatility. Whether you are doing clinical trial analysis, customer segmentation, or financial risk modeling, SAS provides tools that ensure data integrity and actionable insights.


      💻 Key Features of SAS Programming

      1. Data Handling Capabilities
        SAS can handle large volumes of data efficiently, making it ideal for big data projects and complex statistical analysis.

      2. Data Cleaning and Transformation
        Using DATA steps, users can clean, format, merge, and prepare datasets easily.

      3. Analytical Procedures
        SAS offers numerous built-in procedures like PROC MEANS, PROC FREQ, PROC REG, and PROC SQL for statistical and analytical modeling.

      4. Advanced Analytics & Predictive Modeling
        With tools like SAS Enterprise Miner and SAS Visual Analytics, users can perform machine learning, forecasting, and decision trees.

      5. Cross-Platform Compatibility
        Run SAS programs on Windows, UNIX, Linux, and even cloud-based platforms.

      6. Integration with Other Languages
        SAS can integrate with Python, R, SQL, Excel, and other data sources, making it a flexible tool for analysts and data scientists.


      🧠 Who Uses SAS?

      SAS is the tool of choice for:

      • Clinical SAS Programmers in pharmaceutical companies
      • Data Analysts in banking and insurance
      • Government Statisticians for policy modeling
      • Researchers and Academicians in statistical research
      • Business Intelligence Professionals for KPI tracking and dashboarding


      🎓 SAS Certifications and Career Opportunities

      More information on certification in detail

      SAS offers industry-recognized certifications including:

      • Base SAS Certification
      • Advanced SAS Certification
      • SAS Certified Data Scientist
      • Clinical Trials Programmer Certification

      These certifications can boost your career in fields like:

      • Clinical Research
      • Financial Services
      • Marketing Analytics
      • Fraud Detection
      • Customer Analytics

      💼 Job Titles include: SAS Developer, Clinical SAS Programmer, Data Analyst, SAS Consultant, and more.


      🚀 Benefits of Learning SAS

      • High demand in job market
      • Globally recognized software suite
      • Excellent support and documentation
      • Graphical User Interface (GUI) tools available for non-programmers
      • Strong community and training resources


      📚 Getting Started with SAS

      Best ways to learn SAS:

      • Enroll in SAS programming courses
      • Explore SAS tutorials on YouTube and official SAS websites
      • Practice real-world datasets in SAS Studio (free online IDE)
      • Prepare for SAS certifications


      🔗 Conclusion: Why SAS Should Be On Your Radar

      SAS is not just a software—it's an enterprise-grade analytics platform that helps professionals turn raw data into strategic decisions. Whether you're a beginner in data analytics or a seasoned data scientist, learning SAS can add significant value to your career and analytical toolkit.


      🧭 Search Tags :

      What is SAS | SAS Programming Tutorial | Learn SAS | SAS Analytics Tools | SAS Software Benefits | SAS vs R vs Python | Data Analysis with SAS | SAS Certification Guide | Careers in SAS

      Labels: , , , , ,