Javatpoint Logo
Javatpoint Logo

SAS Interview questions

A list of mostly asked SAS Interview Questions is given below:

1) What is SAS?

SAS is a short form of Statistical Analytics System that is developed by the SAS Institute. It is the leading integrated set of software products for advanced analytics, predictive analytics, multivariate analytics, data management, and business intelligence. SAS includes a graphical point-and-click solution to make its user-interface easy and smooth. Thus, SAS provides easy-to-use, user-friendly platforms to non-technical users and adds advanced options through SAS language.


2) What are the features of SAS?

SAS is one of the best analytical platforms with a wide variety of features. The following are the few main features of SAS:

Analytics: SAS is considered one of the leading analytics platforms of different business products and services.

Data Access & Management: SAS also allows users to use it as DBMS (Database Management System) software.

Business Solutions: SAS consists of a solution for performing business analysis. This business analysis can also help companies to build the right business products.

Reporting & Graphics: SAS allows users to generate analysis reports in different formats, such as list, summary, and graphic reports.

Visualization: SAS enables users to visualize the reports in graphs that may include simple scatter plots to bar charts to other complex multi-page classification panels.

SAS Interview questions

3) Why do people prefer using SAS over other data analytics tools available in the market?

There are many alternatives available for SAS, but people prefer using SAS the most. The reason for this is the uniqueness of its features than other data analytics tools in the market. People prefer using SAS due to the following reasons:

Ease of Learning: SAS is straightforward to learn because it has simple concepts. It allows users to use an option like PROC SQL, which makes their work a lot easier. This option is mainly derived from SQL, so users knowing SQL get a slight advantage in working with SAS.

Graphical Capabilities: SAS includes functional graphical capabilities. Due to this, users can quickly learn and start customizing the plots with a little bit of learning.

Data Handling Capabilities: SAS is considered far better than other leading tools and languages (such as Python, R, etc.) in data handling capabilities. It is the best platform to choose when dealing with the vast amount of data. Also, it is best suited for parallel computations.

Advancements in Tool: SAS receives frequent updates, which are designed, developed and tested in a well-controlled environment. On the other side, Python and R are available for contribution openly, and hence there are more chances of error in the latest developments.

Job Scenario: SAS is one of the top leaders in the global market regarding the availability of jobs. According to some reports, SAS controls around 70% of the data analytics market share in Indian corporate jobs.


4) List down a few main capabilities of the SAS framework.

SAS framework has the following essential capabilities:

Access: One of the main capabilities of the SAS framework is Data accessibility. That means data can be easily accessed from different sources, such as raw database, excel file, oracle database and SAS datasets, etc.

Manage: Data management is another vital capability of the SAS framework. That means data accessed from various sources can be easily managed. To manage data, one can perform several functions like creating variables, validating data, cleaning data, creating subset data, etc.

Analyze: Once the data is accessed and managed, it is then analyzed. We can perform either some fundamental analyses (for example - averages, frequency, etc.) or complex analyses (for example - forecasting, regression, etc.).

Present: The analyzed data can be stored or saved in a graphic report, list form, and overall summarized insights. These generated stats can be further printed or published online. Also, these stats can be written into a data file.

SAS Interview questions

5) How many data types are present in SAS?

There are two types of data types available in SAS: "Numeric" and "Character". Besides, dates are also considered characters; however, SAS provides implicit functions to work upon dates.


6) List down the few main functions of SAS.

The following are the main functions of SAS:

  • Business Planning
  • Data Warehousing
  • Statistical Analysis
  • Data Management and Project Management
  • Quality Management
  • Information Retrieval
  • Operational Research and Decisional Support

7) What are the essential components of SAS programming?

There are mainly three components used in SAS programming, such as:

  • Variables
  • Dataset
  • Statements

8) What are the basic syntax rules to be followed while writing the SAS program?

To write a program in SAS, we can use an Editor Window. A program consists of several statements consisting of the appropriate syntax. These statements are arranged in order for the SAS to perform desired functions.

Some basic syntax rules to be followed while writing SAS program are listed below:

  • Each statement must include a semicolon (;) at the end.
  • A semicolon is designed to be used as a separator. That means we can use a semicolon to separate several statements in a single line.
  • Statements in SAS are not case-sensitive. If there are extra spaces before the statements, they are removed automatically.
  • SAS has two different options for inserting a comment for statements, such as:
    1. Comment can be included between an asterisk (*) and a semicolon (;). Start a line with an asterisk and end with a semicolon.
    2. Another way to insert a comment is to include it between a forwarding slash and an asterisk (/*) and an asterisk and a forward slash (*/).

9) What is PDV? Enlist some functions of PDV.

The term PDV is a short form of 'Program Data Vector'. It is generally defined as an area of memory that is used by SAS for building data set. PDV is considered as a logical concept in SAS.

Some of the main functions of PDV are listed below:

  • PDV consists of two automatic variables, such as "_N_" and "_ERROR_". The first variable shows the number of counts for the data-step, which is being executed. In contrast, the second variable informs about the errors that come at the execution period.
  • PDV helps in creating a database that includes one observation at any instant time.
  • PDV helps in creating input buffer at compilation-time. These input buffers are used to keep the data from any external files.

10) How will you elaborate the SAS data set?

SAS dataset is generally defined as the data that is used for the analysis in the SAS program. It is commonly known as the SAS data table.

There are mainly two ways used to arrange data in the data set, such as:

  • Rows of observations
  • Columns of variables

11) Why do we use the output statement while writing programs in SAS?

The 'output' statement is mainly used to save summarized stats in a SAS data set. The saved information can be further used to generate customized reports as per the requirements.

Apart from this, we can use different options in the output statement to perform the followings:

  • Store historical data of the entire process.
  • Define the name of the output data set.
  • Select the desired stats to save in the output data set.
  • Compute and save percentiles that could not get computed automatically.

12) Why do we use Stop Statement in the SAS program?

The 'stop' statement is mainly used in any SAS program to immediately stop current data processing. The processing of the statement is resumed after the end of the corresponding data step.


13) What, according to you, is the main difference between reading data from the existing data-sets and reading data from external files?

When reading data from an existing data set, SAS holds the variables' values from one observation to the other corresponding observation. Besides, while reading the data from external files, SAS doesn't hold the values. In this case, SAS only reads the observations. The variables' values must be declared if there is a need for it. This is the main difference between reading data from existing datas sets and reading data from external files.


14) Explain any scenario when SAS does not automatically change the character value to a numeric value.

It can be explained with a simple example when SAS does not automatically change the character value to a numeric value. Just assume a variable named 'PayRate', and its value starts with a dollar sign ($). In this case, SAS cannot convert or change the values of 'PayRate' to numeric values because the dollar sign prevents the process from being completed. Whenever SAS tries to automatically convert the values, the dollar sign blocks the entire process. As a result, the values don't get converted to numeric values.

That is why it is recommended to use INPUT and PUT functions with programs in case of conversions.


15) Compare SAS BI with SAP BO.

The comparison between SAS BI and SAP BO is tabulated below:

Attributes SAS BI SAP BO
Analytics Easy to use Analytics Platform. Predictive Analytics Platform.
Reason for Deployment Supports quick data integration features with diverse sources. Supports high-level visualization with a user-friendly interface.
Ad-hoc Analysis Average Excellent
Presentation Average Excellent
Mobile BI Excellent Good
Application Connects BI and Analytics to provide enterprise-grade data. Uses frontend suite to provide features like sort, view, and analysis of BI data.

16) Explain BY-group processing.

In SAS, BY-group processing is the method of processing the indexed, ordered, or grouped data based on variables. The BY statement is applied from the BY-group processing to complete the process.


17) How is INPUT different from INFILE in SAS?

The INPUT statement is used to specify the SAS programming variables, whereas the INFILE statement is used to specify an external file containing the data.

The syntax of INPUT:

The syntax of INFILE:


18) What is the significant difference between using the drop=data set option in a set statement and data statement?

In case there is a requirement to process specific variables but don't want them to show in the new data set, we can use the drop=data set option in the data statement.

When we neither want to process specific variables nor want them show in the new data set, we can use the drop=data set option in the set statement.


19) Define data step in reference to SAS.

The Data step in SAS is a form of SAS dataset containing the data and the 'data dictionary'. The data dictionary's primary function is to store all the information of variables along with their properties.


20) What is defined by the term SAS Infomats? Also, enlist the different types of categories that are used to place SAS Informats.

SAS Informat is the set of instructions that instruct SAS how to read the data into SAS variables. These Informats are primarily used to read or input data from external files (also called text files, flat files, sequential files, or ASCII files).

There are mainly three different types of categories used to place SAS Informats, such as:

  • Numeric Informats: INFORMAT w.d
  • Character Informats: $INFORMATw
  • Date/Time Informats: INFORMAT w.

21) How will you differentiate SAS Format and SAS Informat?

The main differences between SAS Format and SAS Informat are tabulated below:

SAS Format SAS Informat
It is used to instruct SAS how to show values in the variables. It is used to instruct SAS how to read data from variables.
Formats are mainly used to write the data. Informats are mainly used for reading or retrieving input data from external files.

22) What is the command used for performing sorting in SAS programs?

We can use the PROC SORT command to perform sorting in the SAS program. This command is suitable for any number of variables within the program. The PROC SORT command works on the dataset. That means the command creates a new data set with sorting and keep the original data set unchanged.

The syntax below shows the use of PROC SORT command in SAS:

The process of sorting can be applied in both ascending and descending orders. An additional keyword is added in the BY statement depending on the requirement for the desired sorting order. We can either add 'ASCENDING' or 'DESCENDING' keyword to perform the necessary task.

For example:


23) Differentiate NODUP and NODUPKEY options.

When it comes to removing duplicate values from a table in SAS, PROC SORT mainly has two options that are used to perform this:

  • NODUP
  • NODUPKEY

We can differentiate these two options with the help of the following table:

NODUP NODUPKEY
It is used to compare all the variables available inside the data set. It is used to compare only BY variables in the current data set.
It is used to find and remove duplicate or repeating observations. It is used to delete options for the variables' values defined in the BY statement.
The syntax below displays the use of the NODUP option in PROC SORT:
PROC SORT DATA=readin NODUP;
BY variable name;
RUN;
The syntax below displays the use of the NODUPKEY option in PROC SORT:
PROC SORT DATA=readin NODUPKEY;
BY variable name;
RUN;

24) How is PROC MEANS different from PROC SUMMARY?

PROC MEANS produces stats for subgroups only in the case when there is a use of BY statement, and the input data was already sorted (using PROC SORT) by the BY variables before use.

On the other side, PROC SUMMARY automatically produces subgroup stats and provides all the information in one run instead of sorting the data set by the variables and running PROC MEANS again and again. However, PROC SUMMARY doesn't provide any information as output unless we have applied the OUTPUT statement to create a new DATA SET and used PROC PRINT to view the complete computed stats.


25) What is the role of PROC print and PROC contents in SAS?

In SAS, PROC print's primary role is to make sure that the data inside the data set is read correctly. PROC contents, on the other side, display the information about the data set.


26) Define DATA_NULL_ in the context of SAS.

DATA_NULL_ is such a type of data step that doesn't create or generate any additional data set. It is beneficial in a scenario when there is a need to create macro variables. Additionally, it can also be used to write the output without any data set.


27) What functions we can use to convert character variables into numeric variables and numeric variables into character variables?

While working with SAS, there are several tasks when we are required to convert character variables into numeric variables and numeric variables into character variables. There are mainly two different functions used to perform these conversions:

PUT(): This function is used to convert numeric variables into character variables. The PUT() function is beneficial when there is a requirement to create a new variable having a different name. Here, it is necessary to have a similar source format type as the source variable.

For example:

INPUT(): This function is used to convert character variables into numeric variables. Like PUT(), the INPUT() function is also beneficial when creating a new variable having a different name. Here, the source variable type must always be a character variable.

For example:


28) What is the role of _CHARACTER_ and _NUMERIC_?

_CHARACTER_ refers to all the character variables, which are currently defined in the existing data step. The statements below are used to specify all the character variables in PROC MEANS:

On the other side, _NUMERIC_ refers to all the numeric variables, which are already defined in the existing data set. The statements below are used to specify all the numeric variables in PROC MEANS:


29) What are the commands used for including or excluding any particular variables in the data set?

There are mainly two commands used for including or excluding any particular variables in the data set; they are:

DROP: We can specify the DROP statement variable to delete or exclude it from the data step.

KEEP: We can specify the variable in the KEEP statement to retain or include it in the data step.

Apart from this, some data set options can also be used to perform this purpose.


30) Mention some character functions used for data cleaning in SAS.

Some of the main character functions used for data cleaning in SAS are given below:

TRIM(str): The aim of using this function is to remove trailing blanks from the string.

COMPRESS(char_string): The aim of using this function is to remove blanks and other desired characters from the string.

UPCASE(char_string): The aim of using this function is to convert all the characters into uppercase in the specified string.

LOWCASE(char_string): The aim of using this function is to convert all the characters into lowercase in the specified string.

COMPBL(str): The aim of using this function is to remove multiple blanks from the string and convert them into a single blank.


31) What is the command used for saving logs in an external file in SAS?

The PROC PRINTTO command is used for saving logs in an external file in SAS. The syntax of this command is shown below:

For Example:

While executing this, a new text file named 'LOG-FILE' will be created in the location C:\Users\javaTpoint\Downloads\


32) Why do we use the SUBSTR function while writing programs in SAS?

SUBSTR function is one of SAS's useful functions that is mainly used when there is a requirement to abstract the substring from a character variable. In case a start position and length are already defined, this function is used to abstract the character string.

The syntax below shows the use of the SUBSTR function in SAS:


33) What methods can we use to create a Macro variable in SAS?

SAS allows users to use several different methods for creating Macro variables. However, listed below are the five most commonly used methods:

  • Using Macro parameters
  • Using the %DO statement (iterative)
  • Using the %LET statement
  • Using CALL SYMPUTX routine
  • Using INTO in PROC SQL

34) What are the few most commonly used options to debug Macros in SAS?

The following are the most common options used to track the macro code along with the SAS code generated by the macros:

  • MLOGIC
  • SYMBOLGEN
  • MPRINT

The messages generated by these options can only be accessed through the SAS log.


35) What is the main difference between SYMGET and SYMPUT?

In SAS, the main function of SYMGET is to retrieve a value from a macro variable to a data set. Whereas on the other hand, the main function of SYMPUT is to store the value of the data set in the macro variable.


36) What is the process of working of the PROC SQL? Explain the main steps.

PROC SQL in SAS programming is considered as the simultaneous process for performing all the observations. The following steps are performed throughout the working of PROC SQL:

  • SAS first scans all the SQL procedure statements and makes sure that there are no syntax errors like missing semicolons or the use of any invalid statements.
  • The SQL Optimizer scans all available queries inside the statement and further decides how the SQL query should be executed to reduce run-time and improve overall performance.
  • If there is a table present inside the FROM statement, it will be loaded into the data engine. Because of this, they can be further accessed in memory easily and quickly.
  • Next, codes and other available calculations are executed.
  • As a result, the final table gets created in memory.
  • The final table is then transferred to the output table, as specified in the SQL statement.

37) What is the most common method used to count the total number of intervals between two dates in SAS?

We can use the interval function INTCK to calculate the total number of intervals between two dates in SAS.

The syntax below displays the use of the INTCK function:


38) What are the methods for deleting duplicate observations in SAS?

There are mainly three methods used for deleting the duplicate observations in SAS:

  • By using an SQL query in the procedure
  • By cleaning the data
  • By using the NODUPS option in the procedure

39) What is the maximum size of the allowed data set in SAS?

The total number of observations depends entirely on the capacity of the system. There can be any number of observations depending on the system's ability to store and handle them.

Before SAS 9.1 version, there was limited support, and the data set had a maximum limit of only 32,767 variables. But, SAS 9.1 or above supports using the maximum possible variable in the SAS data set based on the system's resources.


40) What are some common mistakes that people make while writing programs in SAS?

When writing SAS programs, people make some common errors, especially when they are beginners. The most common errors are as follows:

  • Each statement in SAS must be closed by a semicolon (;). But, most people forget to use it at some points. It is the most common mistake in SAS programming.
  • Missing to check logs once the program is submitted.
  • Not using the proper methods for debugging process.
  • Making commenting errors by either not using a proper way or failing to implement comments where necessary.