In this SAS terminology tutorial, we are providing a list of important SAS terminology that will come to you during learning SAS. Before starting SAS terminology, you can modify the concept of the SAS programming language first.
Here we are going to discuss SAS terminologies, which are helpful in data science and used in SAS programming. SAS is used for advanced analytics, predictive analytics, data management, business intelligence, and multivariate analysis.
Different SAS Terminologies
SAS interacts with IMS databases through an interface view engine. Interface view engine uses SAS / ACCESS descriptor files, created with the ACCESS procedure. There are two types of descriptor files:
Access descriptor contains information of the IMS (Information Management System) database to be used. The information of database includes the IMS field name, database format, database name, section name and length, key fields and default SAS format. An access descriptor consists of a unique handling idea for a field, and it indicates whether an entity occurs multiple times in a database segment or not. The SAS / ACCESS file describes data for the SAS software, and this data is present in a PC file.
An access descriptor works as a master descriptor file because it contains a complete description of the database whereas IMS does not store descriptive information about a database.
View descriptor is used to define data in subsets. It defines subsets only of access descriptor described data. View descriptor is used in the SAS program, to write or read data directly in the IMS database. By, using view descriptors IMS data can be extracted and placed in a SAS data file.
Inside a table, each column has one attribute and the data type which indicates operating environment, the capacity of physical storage and type of data (like int, Boolean, String etc.) present in the column.
There are only two data types that are used in the SAS: the real number and fixed length strings of character. Real numbers are used to store dates and time internally as numbers and characters are always as macro variables. Value of character should be cited to separate them from other language elements like variables.
It is a vertical component of the PC file which has a unique name and specific type data with certain attributes. A column corresponds to a variable in the SAS terminology.
Column function is an operation that calculates each value of the column. For example, salary is a column function of a column, which need to be calculated.
In SAS software, the data value is a unit of character or numeric information that is presented in a SAS data set. A data value represents a variable in an observation.
Browsing data is a process of viewing data of the file which contains observations.
File is a collection of entities that are related to each other in a well-organized manner. Each record is treated as a unit and controlled through SAS software. These SAS files are processed and stored in the SAS Data Library.
Database Management System (DBMS)
DBMS is an integrated software package to create and manipulate data. The data is represented in the form of relational tables inside the database.
The instructions, used by SAS software to write or display the value of each variable is known as a format. Some formats are supplied by SAS software, and others are written by the user in the base SAS software by using the format procedure.
The SAS database is a collection of related data in the form of an organized table. In a relational database management system database contains objects such as indexes, views, and tables so that data can be accessed in a systematic manner.
SAS software has many parts, and the engine is one of them. The responsibility of the engine is, to read data from the file and also write to the file.
The objective of the SAS index is to optimize "WHERE-clause" processing and facilitate BY-group processing. We also use these indices to optimize wear-clause processing and to be involved in processing.
INFORMAT statement is used to associate an informat with a variable. We can specify standard SAS informats provided by SAS software or user- defined informats, but both should be defined previously in PROC FORMAT. A single INFORMAT statement can link the same informat with multiple variables or different informats with multiple variables. If a variable appears in several INFORMAT statements, then the SAS uses the last assigned informat.
INFORMAT variable-1 <...variable-n>
An INFORMAT statement defines the length of the previously undefined character variable so that you can shorten the values of the character variable in a DATA step if the INFORMAT statement occurs before the SET statement.
Libref is the name that is temporarily linked to the SAS Data Library. For example - in the SASUSERS.ACCOUNTS name, SASUSER is a Libref. You assign a libref with a LIBNAME statement or operating system control language.
The SAS file of SAS data library is known as a member.
The name of SAS file of SAS data library is called member name.
Member type identifies the type of information of the SAS file. Member type includes DATA, ACCESS, CATALOG, VIEW, and PROGRAM.
Missing value in SAS software indicates that there is no stored data in the variable for the current observation. By default, SAS software represents a missing numeric value with a single duration, and an empty space represents the missing character value.
Observation is a horizontal component of the SAS data file. An observation data is a collection of values that are associated with a single entity, such as a customer. Each observation contains a data value for each variable in the data file. An observation is consistent with the row in a PC file.
PROC SQL View
A PROC SQL view is a SAS data set that is created by PROC SQL. It comes under view because this is a subpart of views. A PROC SQL view does not contain any data. It is used only to store query expression that reads data values from its underlying files. Underlying files include SAS data files, SAS / ACCESS views, data-step view, or other PROC-SQL views. On execution, the output of the PROC SQL view can either be superset or subset of one or more underlying files.
A record is consistent with SAS observation.
Relational Database Management System (RDBMS)
Relational Database Management System is a database which is used to organize and access data according to the relationship between data entities.
The row is a horizontal component of a PC file. Each row corresponds to the SAS observation.
SAS Data File
The SAS data file is a type of SAS data set which has both data values and descriptor information. SAS data file is linked with the data, such as the attributes of a variable.
The SAS data file consists of two types: native SAS data file and the second interface SAS data file.
Native SAS data file
Stores data values and descriptor information in a file formatted by SAS.
Interface SAS data file
Interface SAS data file is used to store data in a file that is formatted by software other than SAS software. The engine of SAS software reads and writes data from files that were formatted by other software such as DB2, Oracle, Sybase, ODBC, BMDP, OSIRIS, and SPSS.
These formatted files are considered as interface SAS data files, and when engine accesses their data values, SAS recognizes them as a SAS data set.
"The client site licensing agreement always determines the availability of engines to access different types of interface data files. To see the availability of the engines, see your system administrator."
SAS Data Library
The SAS Data Library is a collection of one or multiple SAS files that are recognized by SAS software and can be referenced and stored as a unit. Each file is an essential part of the library and considered as a member.
The SAS data library helps you to organize your work. For example, if a SAS program uses more than one SAS file, then you can keep all the files in one library. Organizing files in libraries make it easy to locate files and referencing them in a program.
Under most operating environments, the SAS data library highly matches the level of organization that the operating environment uses to organize files. For example, in a directory-based operating environment, the SAS Data Library is a group of SAS files in the same directory. The directory may contain other files, but only SAS files are considered as a part of the SAS data library.
Information about Operating Environment:
Under the CMS operating environment, a SAS Data Library is a collection of the same type of files. Under the Z / OS operating environment, the SAS Data Library is a specially formatted z / OS data set. Such data sets can only contain SAS files.
SAS Data Set
The SAS data set is a SAS file stored in a SAS library. It is created and processed by SAS software. The SAS data set contains data values that are arranged in the form of a table of observations (rows) and variables (columns) that can be processed by SAS software. The SAS data set also contains descriptive information such as the data types and lengths of the variables, as well as the engine used to create the data. SAS data set can be one of the SAS data and SAS view.
SAS data contains both the descriptor information and data. The member type of SAS data is a file.
SAS Data View
SAS View is a kind of SAS data set which retrieves data values from other files. The SAS view contains only descriptor information such as the data type and length of the variable (column). It also includes additional information that is required to obtain data values from files stored in other SAS data sets or other software vendors' file formats. SAS data view is one of the types of VIEW. You can use the SAS view if it contains attributes of SAS data file.
There are two types of SAS views:
The native view is a SAS view that is created either with PROC SQL or with a DATA step.
Interface view is a SAS view that is made with SAS/ACCESS software. An interface view is used to read data from or write data into a database management system (DBMS) such as Oracle or DB2. Interface views can be referred to as SAS/ACCESS views. It is compulsory to have License for SAS/ACCESS software to use it.
Structured Query Language (SQL)
SQL is a high-level query language that is used to create and manipulate data of relational database management systems. SAS software uses SQL procedure to implement data objects in the database.
Table Alias is a temporary, alternative name for a table that is specified in the FROM section. When we join the table, we alternately use table aliases to qualify for column names.
Table Lookup is a processing technique in which information is obtained from an auxiliary source based on the values of variables in the primary source.
Variable is a column in the SAS data set and a set of data values which describe the attributes given in all the observations. In the ACCESS process, variables are created from the columns of PC files or fields.
Target variable is a variable in which the result of a function or expression is assigned.