Javatpoint Logo
Javatpoint Logo

SAS - Merge Dataset

Joining | Combining

In the last topic, we have learned multiple variables sorting in SAS, and saw that we could sort data values on the basis of multiple variables simultaneously. Now, we are going to learn how we can merge data sets in SAS Programming language. You will find many examples, for great understanding.

So let's start.......

What is the SAS Merge?

Merge in SAS is a process which combines observations from two or more SAS datasets. The base of merging is, the merging datasets must have a common variable with common data values (or observations) inside them.

Use the following steps to merge datasets:

  • Create a new dataset for merging.
  • We use the By statement to denote the name of a common variable which is used for matching prerequisites to merge.
  • Use MERGE statement before the name of the dataset.
  • The merging datasets must have at least one common variable.

For Example:

SAS Merge Dataset

As you can see above, the ID is a common variable among both datasets.

Use the following syntax to merge dataset first and second:

Output:

ID    A   B   C
101    1   2    6
102    3   4   5
103    5   6   7
104    3    7   9

In the above example, both data sets have been merged. Both datasets contain different data, and the variable ID contains the same data as it is a unique identifier (no duplicate).

Let's understand through an example:

When we merge two data sets using the merge statement, and we already know that each observation of a common variable has a match in the other dataset, then we can merge very straight-forward. Here is an example.

Often, in the same case, different data is stored in two or more separate data sets. For example, we want to store data of an employee, for that we created two datasets, one is essential, and another is additional. The essential dataset contains the very basic information of employee such as employment ID, name, mobile, email, adhar etc, and the additional dataset contains information which is rarely used.

Data essential

Dataset additional

Now, create new dataset "combined" to merge essential and additional datasets.

Now, execute this code in SAS studio.

SAS Merge Dataset

Output:

SAS Merge Dataset

When Matching is Not Perfect

What happens if the variable is common, but its data values are not common between datasets?

For Example:

SAS Merge Dataset

In data set mother, second observation (ID=201) does not match the second observation (ID =401) of data set child. A match-merge will be like this:

Result

ID   A   B   C
101   1   2     0    both data sets are contributing to this observation
201   3   4     .     only data set first contributed to this observation          
301   5   6     1    both data sets contributed to this observation

Let's understand through an example:

Here we are using the example mentioned above, but changing some of its data values . Taking two data sets; one is essential, and another is additional. Now, we have changed two data values of variable ID in the additional dataset, let's see what will be the result.

Data essential;
input employeeid name$ age mobile email$ adhar$;
datalines;
101 Vikas 25 9374747373 vikas@gmail.com 234390876344
102 Nikita 23 9086746353 Nikita@gmail.com 128743526709
103 Ginni 27 9845637238 Ginni@gmail.com 980706509823
104 Sonoo 35 9893235487 Sonoo@gmail.com 123214567876
105 Gaurav 26 9893235445 Gaurav@gmail.com 123454326789
106 Ayush 24 9893235432 Ayush@gmail.com 234565432123
107 Harshita 23 9893232345 Harshita@gmail.com 908756897645
108 Preeti 27 9993235487 Preeti@gmail.com 987605432123
109 Mayur 28 9893235432 Mayur@gmail.com 123213487654
110 Rajesh 29 9893235409 Rajesh@gmail.com 456578987623
;
run;  
data additional;
input employeeid address$ father$ mother$;
datalines;
101 bhopal narendra amrita
111 kolkata mohan anita
103 timarnee ramkumar vaishnavi
123 meeruth arjun megha
105 delhi kailash manorama
106 noida ramakant sunita
107 gorakhpur shreelal shikha
108 seehor shekher sheela
109 indore krishna tulsi
110 dewas susheel janki
;
run;
Data combined;
merge essential additional;
By employeeid;
run;
proc print data=combined;
run;

Execute this code in SAS Studio

SAS Merge Dataset

Output:

SAS Merge Dataset

In the above code you can see, variable ID is common in both data sets essential and additional, but two data values (highlighted in images, one by green and another by red) of variable ID changed in dataset "additional."

When we execute this code in SAS Studio, it will produce output according to the data values mentioned in the data set "essential" (consider the output image), but if you check it in the log window, you can see the error message.

ERROR: BY variables are not properly sorted on data set WORK.ADDITIONAL.
 employeeid=111 name=  age=. mobile=. email=. adhar=  address=kolkata 
 father=mohan mother=anita FIRST.employeeid=1 LAST.employeeid=1
 _ERROR_=1 _N_=6

IN= Variables

What to do, if you want to put only the merge output in the data set? Means only those observations in which both input datasets are contributing.

In other words, when you don't want to print observations that are not contributing then use IN=Variables.

SAS has already installed special temporary variables for you, which are called "IN = variables" so that you can do more.

Now, what you have to do:

1. Use extra variables with merge statement for both merging datasets

Syntax

2. In the data step, use "IN= variables" appropriately.

Let's understand through an example:

SAS Merge Dataset
SAS Merge Dataset

We can see in the output; data set other kept only data values of matched IDs.

  • If we want to keep only contributing observations then, give value 1 to the variable a and b.
  • If we want to keep only the observations that are not contributing then give 0 to the variable a and b.

In the above example, the "IN= variables" A and B taking values like this:

SAS Merge Dataset

If you want to keep not only matching data values, but also to track different data sets of non-matching data values, you can create three data sets in this way:

Summary

So, this is all about merge data sets. In this section we have learned, what is SAS merge Datasets, how to merge two or more datasets and what happens when there is no match in Data Sets.

Hope, you understood the topic clearly. But, if you still have any problem, please ask in our contact section.






Please Share

facebook twitter google plus pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA