SAS Interview questions
A list of mostly asked SAS Interview Questions is given below:
1) What is SAS?
SAS is a short form of Statistical Analytics System that is developed by the SAS Institute. It is the leading integrated set of software products for advanced analytics, predictive analytics, multivariate analytics, data management, and business intelligence. SAS includes a graphical point-and-click solution to make its user-interface easy and smooth. Thus, SAS provides easy-to-use, user-friendly platforms to non-technical users and adds advanced options through SAS language.
2) What are the features of SAS?
SAS is one of the best analytical platforms with a wide variety of features. The following are the few main features of SAS:
Analytics: SAS is considered one of the leading analytics platforms of different business products and services.
Data Access & Management: SAS also allows users to use it as DBMS (Database Management System) software.
Business Solutions: SAS consists of a solution for performing business analysis. This business analysis can also help companies to build the right business products.
Reporting & Graphics: SAS allows users to generate analysis reports in different formats, such as list, summary, and graphic reports.
Visualization: SAS enables users to visualize the reports in graphs that may include simple scatter plots to bar charts to other complex multi-page classification panels.
3) Why do people prefer using SAS over other data analytics tools available in the market?
There are many alternatives available for SAS, but people prefer using SAS the most. The reason for this is the uniqueness of its features than other data analytics tools in the market. People prefer using SAS due to the following reasons:
Ease of Learning: SAS is straightforward to learn because it has simple concepts. It allows users to use an option like PROC SQL, which makes their work a lot easier. This option is mainly derived from SQL, so users knowing SQL get a slight advantage in working with SAS.
Graphical Capabilities: SAS includes functional graphical capabilities. Due to this, users can quickly learn and start customizing the plots with a little bit of learning.
Data Handling Capabilities: SAS is considered far better than other leading tools and languages (such as Python, R, etc.) in data handling capabilities. It is the best platform to choose when dealing with the vast amount of data. Also, it is best suited for parallel computations.
Advancements in Tool: SAS receives frequent updates, which are designed, developed and tested in a well-controlled environment. On the other side, Python and R are available for contribution openly, and hence there are more chances of error in the latest developments.
Job Scenario: SAS is one of the top leaders in the global market regarding the availability of jobs. According to some reports, SAS controls around 70% of the data analytics market share in Indian corporate jobs.
4) List down a few main capabilities of the SAS framework.
SAS framework has the following essential capabilities:
Access: One of the main capabilities of the SAS framework is Data accessibility. That means data can be easily accessed from different sources, such as raw database, excel file, oracle database and SAS datasets, etc.
Manage: Data management is another vital capability of the SAS framework. That means data accessed from various sources can be easily managed. To manage data, one can perform several functions like creating variables, validating data, cleaning data, creating subset data, etc.
Analyze: Once the data is accessed and managed, it is then analyzed. We can perform either some fundamental analyses (for example - averages, frequency, etc.) or complex analyses (for example - forecasting, regression, etc.).
Present: The analyzed data can be stored or saved in a graphic report, list form, and overall summarized insights. These generated stats can be further printed or published online. Also, these stats can be written into a data file.
5) How many data types are present in SAS?
There are two types of data types available in SAS: "Numeric" and "Character". Besides, dates are also considered characters; however, SAS provides implicit functions to work upon dates.
6) List down the few main functions of SAS.
The following are the main functions of SAS:
7) What are the essential components of SAS programming?
There are mainly three components used in SAS programming, such as:
8) What are the basic syntax rules to be followed while writing the SAS program?
To write a program in SAS, we can use an Editor Window. A program consists of several statements consisting of the appropriate syntax. These statements are arranged in order for the SAS to perform desired functions.
Some basic syntax rules to be followed while writing SAS program are listed below:
9) What is PDV? Enlist some functions of PDV.
The term PDV is a short form of 'Program Data Vector'. It is generally defined as an area of memory that is used by SAS for building data set. PDV is considered as a logical concept in SAS.
Some of the main functions of PDV are listed below:
10) How will you elaborate the SAS data set?
SAS dataset is generally defined as the data that is used for the analysis in the SAS program. It is commonly known as the SAS data table.
There are mainly two ways used to arrange data in the data set, such as:
11) Why do we use the output statement while writing programs in SAS?
The 'output' statement is mainly used to save summarized stats in a SAS data set. The saved information can be further used to generate customized reports as per the requirements.
Apart from this, we can use different options in the output statement to perform the followings:
12) Why do we use Stop Statement in the SAS program?
The 'stop' statement is mainly used in any SAS program to immediately stop current data processing. The processing of the statement is resumed after the end of the corresponding data step.
13) What, according to you, is the main difference between reading data from the existing data-sets and reading data from external files?
When reading data from an existing data set, SAS holds the variables' values from one observation to the other corresponding observation. Besides, while reading the data from external files, SAS doesn't hold the values. In this case, SAS only reads the observations. The variables' values must be declared if there is a need for it. This is the main difference between reading data from existing datas sets and reading data from external files.
14) Explain any scenario when SAS does not automatically change the character value to a numeric value.
It can be explained with a simple example when SAS does not automatically change the character value to a numeric value. Just assume a variable named 'PayRate', and its value starts with a dollar sign ($). In this case, SAS cannot convert or change the values of 'PayRate' to numeric values because the dollar sign prevents the process from being completed. Whenever SAS tries to automatically convert the values, the dollar sign blocks the entire process. As a result, the values don't get converted to numeric values.
That is why it is recommended to use INPUT and PUT functions with programs in case of conversions.
15) Compare SAS BI with SAP BO.
The comparison between SAS BI and SAP BO is tabulated below:
16) Explain BY-group processing.
In SAS, BY-group processing is the method of processing the indexed, ordered, or grouped data based on variables. The BY statement is applied from the BY-group processing to complete the process.
17) How is INPUT different from INFILE in SAS?
The INPUT statement is used to specify the SAS programming variables, whereas the INFILE statement is used to specify an external file containing the data.
The syntax of INPUT:
The syntax of INFILE:
18) What is the significant difference between using the drop=data set option in a set statement and data statement?
In case there is a requirement to process specific variables but don't want them to show in the new data set, we can use the drop=data set option in the data statement.
When we neither want to process specific variables nor want them show in the new data set, we can use the drop=data set option in the set statement.
19) Define data step in reference to SAS.
The Data step in SAS is a form of SAS dataset containing the data and the 'data dictionary'. The data dictionary's primary function is to store all the information of variables along with their properties.
20) What is defined by the term SAS Infomats? Also, enlist the different types of categories that are used to place SAS Informats.
SAS Informat is the set of instructions that instruct SAS how to read the data into SAS variables. These Informats are primarily used to read or input data from external files (also called text files, flat files, sequential files, or ASCII files).
There are mainly three different types of categories used to place SAS Informats, such as:
21) How will you differentiate SAS Format and SAS Informat?
The main differences between SAS Format and SAS Informat are tabulated below:
22) What is the command used for performing sorting in SAS programs?
We can use the PROC SORT command to perform sorting in the SAS program. This command is suitable for any number of variables within the program. The PROC SORT command works on the dataset. That means the command creates a new data set with sorting and keep the original data set unchanged.
The syntax below shows the use of PROC SORT command in SAS:
The process of sorting can be applied in both ascending and descending orders. An additional keyword is added in the BY statement depending on the requirement for the desired sorting order. We can either add 'ASCENDING' or 'DESCENDING' keyword to perform the necessary task.
23) Differentiate NODUP and NODUPKEY options.
When it comes to removing duplicate values from a table in SAS, PROC SORT mainly has two options that are used to perform this:
We can differentiate these two options with the help of the following table:
24) How is PROC MEANS different from PROC SUMMARY?
PROC MEANS produces stats for subgroups only in the case when there is a use of BY statement, and the input data was already sorted (using PROC SORT) by the BY variables before use.
On the other side, PROC SUMMARY automatically produces subgroup stats and provides all the information in one run instead of sorting the data set by the variables and running PROC MEANS again and again. However, PROC SUMMARY doesn't provide any information as output unless we have applied the OUTPUT statement to create a new DATA SET and used PROC PRINT to view the complete computed stats.
25) What is the role of PROC print and PROC contents in SAS?
In SAS, PROC print's primary role is to make sure that the data inside the data set is read correctly. PROC contents, on the other side, display the information about the data set.
26) Define DATA_NULL_ in the context of SAS.
DATA_NULL_ is such a type of data step that doesn't create or generate any additional data set. It is beneficial in a scenario when there is a need to create macro variables. Additionally, it can also be used to write the output without any data set.
27) What functions we can use to convert character variables into numeric variables and numeric variables into character variables?
While working with SAS, there are several tasks when we are required to convert character variables into numeric variables and numeric variables into character variables. There are mainly two different functions used to perform these conversions:
PUT(): This function is used to convert numeric variables into character variables. The PUT() function is beneficial when there is a requirement to create a new variable having a different name. Here, it is necessary to have a similar source format type as the source variable.
INPUT(): This function is used to convert character variables into numeric variables. Like PUT(), the INPUT() function is also beneficial when creating a new variable having a different name. Here, the source variable type must always be a character variable.
28) What is the role of _CHARACTER_ and _NUMERIC_?
_CHARACTER_ refers to all the character variables, which are currently defined in the existing data step. The statements below are used to specify all the character variables in PROC MEANS:
On the other side, _NUMERIC_ refers to all the numeric variables, which are already defined in the existing data set. The statements below are used to specify all the numeric variables in PROC MEANS:
29) What are the commands used for including or excluding any particular variables in the data set?
There are mainly two commands used for including or excluding any particular variables in the data set; they are:
DROP: We can specify the DROP statement variable to delete or exclude it from the data step.
KEEP: We can specify the variable in the KEEP statement to retain or include it in the data step.
Apart from this, some data set options can also be used to perform this purpose.
30) Mention some character functions used for data cleaning in SAS.
Some of the main character functions used for data cleaning in SAS are given below:
TRIM(str): The aim of using this function is to remove trailing blanks from the string.
COMPRESS(char_string): The aim of using this function is to remove blanks and other desired characters from the string.
UPCASE(char_string): The aim of using this function is to convert all the characters into uppercase in the specified string.
LOWCASE(char_string): The aim of using this function is to convert all the characters into lowercase in the specified string.
COMPBL(str): The aim of using this function is to remove multiple blanks from the string and convert them into a single blank.
31) What is the command used for saving logs in an external file in SAS?
The PROC PRINTTO command is used for saving logs in an external file in SAS. The syntax of this command is shown below:
While executing this, a new text file named 'LOG-FILE' will be created in the location C:\Users\javaTpoint\Downloads\
32) Why do we use the SUBSTR function while writing programs in SAS?
SUBSTR function is one of SAS's useful functions that is mainly used when there is a requirement to abstract the substring from a character variable. In case a start position and length are already defined, this function is used to abstract the character string.
The syntax below shows the use of the SUBSTR function in SAS:
33) What methods can we use to create a Macro variable in SAS?
SAS allows users to use several different methods for creating Macro variables. However, listed below are the five most commonly used methods:
34) What are the few most commonly used options to debug Macros in SAS?
The following are the most common options used to track the macro code along with the SAS code generated by the macros:
The messages generated by these options can only be accessed through the SAS log.
35) What is the main difference between SYMGET and SYMPUT?
In SAS, the main function of SYMGET is to retrieve a value from a macro variable to a data set. Whereas on the other hand, the main function of SYMPUT is to store the value of the data set in the macro variable.
36) What is the process of working of the PROC SQL? Explain the main steps.
PROC SQL in SAS programming is considered as the simultaneous process for performing all the observations. The following steps are performed throughout the working of PROC SQL:
37) What is the most common method used to count the total number of intervals between two dates in SAS?
We can use the interval function INTCK to calculate the total number of intervals between two dates in SAS.
The syntax below displays the use of the INTCK function:
38) What are the methods for deleting duplicate observations in SAS?
There are mainly three methods used for deleting the duplicate observations in SAS:
39) What is the maximum size of the allowed data set in SAS?
The total number of observations depends entirely on the capacity of the system. There can be any number of observations depending on the system's ability to store and handle them.
Before SAS 9.1 version, there was limited support, and the data set had a maximum limit of only 32,767 variables. But, SAS 9.1 or above supports using the maximum possible variable in the SAS data set based on the system's resources.
40) What are some common mistakes that people make while writing programs in SAS?
When writing SAS programs, people make some common errors, especially when they are beginners. The most common errors are as follows: