Javatpoint Logo
Javatpoint Logo

Pandas DataFrame.join()

When we want to concatenate our DataFrames, we can add them with each other by stacking them either vertically or side by side. Another method to combine these DataFrames is to use columns in each dataset that contain common values. The method of combining the DataFrame using common fields is called "joining". The method that we use for combining the DataFrame is a join() method. The columns that contain common values are called "join key".

The join() method is often useful when one DataFrame is a lookup table that contains additional data added into the other DataFrame. It is a convenient method that can combine the columns of two differently-indexed DataFrames into a single DataFrame.

Identifying join keys

To determine the appropriate join keys, first, we have to define required fields that are shared between the DataFrames. Both the DataFrames consist of the columns that have the same name and also contain the same data.

Inner joins

Inner join can be defined as the most commonly used join. Basically, its main task is to combine the two DataFrames based on a join key and returns a new DataFrame. The returned DataFrame consists of only selected rows that have matching values in both of the original DataFrame.

Left joins

If we want to add some information into the DataFrame without losing any of the data, we can simply do it through a different type of join called a "left outer join" or "left join".

Like an inner join, left join also uses the join keys to combine two DataFrames, but unlike inner join, it returns all of the rows from the left DataFrame, even those rows whose join keys do not include the values in the right DataFrame.

Syntax:

Parameters:

other: It refers to the DataFrame or Series.

In this case, the index should be similar to one of the columns. If we pass a Series, the named attribute has to be set for using it as the column name in the resulting joined DataFrame.

on: It is an optional parameter that refers to array-like or str values.

It refers to a column or index level name in the caller to join on the index. Otherwise, it joins index-on-index. If multiple values are present, then the other DataFrame must have MultiIndex. It is like an Excel VLOOKUP operation that can pass an array as the join key if it is not already contained within the calling DataFrame.

how: It refers to 'left', 'right', 'outer', 'inner' values that mainly work on how to handle the operation of the two objects. The default value of how is left.

  • left: It uses a calling frame's index or column if the parameter on is specified.
  • right: It uses the other index.
  • outer: It is used to form a union of calling frame's index or column if parameter on is specified with other's index, and also sort it lexicographically.
  • inner: It is used to form an intersection of calling frame's index or column if parameter on is specified with other's index. So, due to this, it preserves the order of the calling object.

lsuffix: It refers to a string object that has the default value ''. It uses the Suffix from the left frame's overlapping columns.

rsuffix: It refers to a string value, that has the default value ''. It uses the Suffix from the right frame's overlapping columns.

sort: It consists of a boolean value that sorts the resulting DataFrame lexicographically by the join key. If we pass False value, then the order of the join key mainly depends on the join type, i.e., how.

Example: The below example shows the working of join() function.

Output:

    key  A   B
0   K0   A0  B0
1   K1   A1  B1
2   K2   A2  B2
3   K3   A3  NaN
4   K4   A4  NaN
5   K5   A5  NaN

Example2: The below example joins the two MultiIndexes:

Output:

			value
xyz	pq	num	
x	p	1	0
2	1
q	1	2
2	3
y	p	1	4
2	5
q	1	6
2	7
z	p	1	8
2	9
q	1	10
2	11

Next TopicDataFrame.mean()





Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA