When we want to concatenate our DataFrames, we can add them with each other by stacking them either vertically or side by side. Another method to combine these DataFrames is to use columns in each dataset that contain common values. The method of combining the DataFrame using common fields is called "joining". The method that we use for combining the DataFrame is a join() method. The columns that contain common values are called "join key".
The join() method is often useful when one DataFrame is a lookup table that contains additional data added into the other DataFrame. It is a convenient method that can combine the columns of two differently-indexed DataFrames into a single DataFrame.
Identifying join keys
To determine the appropriate join keys, first, we have to define required fields that are shared between the DataFrames. Both the DataFrames consist of the columns that have the same name and also contain the same data.
Inner join can be defined as the most commonly used join. Basically, its main task is to combine the two DataFrames based on a join key and returns a new DataFrame. The returned DataFrame consists of only selected rows that have matching values in both of the original DataFrame.
If we want to add some information into the DataFrame without losing any of the data, we can simply do it through a different type of join called a "left outer join" or "left join".
Like an inner join, left join also uses the join keys to combine two DataFrames, but unlike inner join, it returns all of the rows from the left DataFrame, even those rows whose join keys do not include the values in the right DataFrame.
other: It refers to the DataFrame or Series.
In this case, the index should be similar to one of the columns. If we pass a Series, the named attribute has to be set for using it as the column name in the resulting joined DataFrame.
on: It is an optional parameter that refers to array-like or str values.
It refers to a column or index level name in the caller to join on the index. Otherwise, it joins index-on-index. If multiple values are present, then the other DataFrame must have MultiIndex. It is like an Excel VLOOKUP operation that can pass an array as the join key if it is not already contained within the calling DataFrame.
how: It refers to 'left', 'right', 'outer', 'inner' values that mainly work on how to handle the operation of the two objects. The default value of how is left.
lsuffix: It refers to a string object that has the default value ''. It uses the Suffix from the left frame's overlapping columns.
rsuffix: It refers to a string value, that has the default value ''. It uses the Suffix from the right frame's overlapping columns.
sort: It consists of a boolean value that sorts the resulting DataFrame lexicographically by the join key. If we pass False value, then the order of the join key mainly depends on the join type, i.e., how.
Example: The below example shows the working of join() function.
key A B 0 K0 A0 B0 1 K1 A1 B1 2 K2 A2 B2 3 K3 A3 NaN 4 K4 A4 NaN 5 K5 A5 NaN
Example2: The below example joins the two MultiIndexes:
value xyz pq num x p 1 0 2 1 q 1 2 2 3 y p 1 4 2 5 q 1 6 2 7 z p 1 8 2 9 q 1 10 2 11