Reshma Haridhas

Posted on Jan 15 • Originally published at reshmaharidhas.Medium

How to Merge DataFrames in Pandas — Inner, Outer, Left & Right

#pandas #python #datascience #coding

Two datasets or dataframes can be merged together side-by-side in Pandas using the merge() method.

The merge() method takes two dataframe as inputs, and returns a dataframe object. The merging can be done based on the data under specific columns called as key.

Types of Merging:

The merging can be done as any of the four types of join operations below:

Inner join
Outer join
Left outer join
Right outer join

By default, the merge() method performs the inner join. To perform other types of joins while merging, the ‘how’ attribute should be assigned with values such as “outer”, “left”, “right”.
The column name on which the merging has to be done is assigned to the ‘on’ parameter within the merge() method. Let us see an example for understanding merging in Pandas.
Example: Consider merging the two dataframes X and Y below. X has three rows and three columns. Y has five rows and two columns.

After X and Y are merged, the resulting dataframe Z is as below with two rows only.

Code snippet:

import pandas as pd
X = pd.DataFrame({"id":[3,64,2],
                  "name":["Jack","Alex","Roy"],
                  "city":["Zurich","Auckland","Paris"]})
Y = pd.DataFrame({"name":["Alex","Mark","Jackie","Roy","Zoya"],
                  "fav":["books","tea","rain","music","technology"]})
print(pd.merge(X,Y))

Output:

How merging happened here?
The dataframes X and Y are merged without mentioning any specific type of join operation. By default, it creates inner join operation between the data under the column name that are common(same) between X and Y.
Step 1: Find the column name that is common between X and Y is ‘name’.

Step 2: Find the data under those column names that are common between X and Y. The common data under the column ‘name’ are Alex, and Roy which are marked in yellow color.

Step 3: Select only those rows from both the dataframes X and Y, and place the columns from the secondly passed dataframe in the merge() method which is Y to the right-side of X.

The column names that are marked in red are the common column names between X and Y. The column names marked in orange color are the uncommon column names from X. The column names marked in green color are the uncommon column names from Y.
Parameters:
Some of the important parameters to use within the merge() method are below.

on — The column name on which the merging operation has to be done. If merging is based on more than one column name, the column names are assigned as an array of string.
how — mentions the type of join such as outer, inner, left, right to perform.
suffixes — Defines the custom suffix to be appended to the common column names between two dataframes, when the merging is not based on them. By default, the suffix for the first mentioned dataframe is _x. By default, the suffix for the second mentioned dataframe is _y.
left_on — Defines the unique column name from the first passed dataframe X in the merge() method on which the merging is to be done.
right_on — Defines the unique column name from the second passed dataframe Y in the merge() method on which the merging is to be done.

Merging based on a common column name between dataframes:

Here is the code snippet for merging based on a common column name which is ‘name’.

Output:

The above resultant dataframe z is containing two rows based on merging using the column name ‘name’. Another thing to note is that the another column which is common in both x and y is ‘city’. It got changed to ‘city_x’ and ‘city_y’ automatically as per the default suffixes.

Merging dataframes using custom suffixes:

Here is the code snippet where the merging is done between two dataframes X and Y, and also the custom suffixes are set for the other common column names between X and Y that are based for merging.

Output:

The dataframe Z has the custom set suffixes as ‘city_left’ and ‘city_right’.

Merging based on two common column names in two dataframes:

When merging is done based on more than a common column name between two dataframes, those column names are assigned as an array of strings to the ‘on’ parameter.
Example 1: Here is the code snippet.

Output:

Here, the resultant dataframe Z is formed by finding the rows that have same data under the two common column names assigned to the ‘on’ parameter.

Example 2: Here is the code snippet of another two dataframes where merging is done based on two column names with different data.

Output:

Here the merging is done on two common column names between X and Y which are ‘name’ and ‘city’. But the data under these columns are same only for one row.

Merging based on unique column names in each dataframe:

The two dataframes can contain different column names that are not common, but is unique to each dataframe. Below, in the dataframe X, there is a column name ‘id’ which is unique to X. Similarly, the column name ‘age’ is unique to the dataframe Y itself. Let us merge these unique column names using ‘left_on’ and ‘right_on’ parameter.

Output:

Initially, the column names are selected (marked in red color). Then the common data under those two unique column names in X and Y dataframes are selected. Those rows from both the dataframes are placed side-by-side as merging.

Outer join:

Here is the code snippet for applying outer join on the common column ‘name’ between two dataframes.

Output:

Explanation: Let us see the below input dataframes for finding the common data under the column ‘name’ marked in yellow color, and the uncommon data under the column ‘name’ marked in green color in both the dataframes X and Y.

Here the rows that have common data (marked in yellow) are placed with side-by-side data from X and Y. But the uncommon data under the column ‘name’ in the dataframe X will have empty data under the columns that are present uniquely in Y such as ‘fav’ and ‘age’. Similarly, the uncommon data under the column ‘name’ in the dataframe Y will have empty data under the columns that are unique to X such as ‘id’, and ‘city’.

Left Outer join:

The left outer join takes the rows that have the same data between the two dataframes X and Y under the key column, and also all the rows with uncommon data under the key column assigned to the ‘on’ parameter from the X dataframe.

Here is the code snippet for demonstrating left outer join in merging.

The input dataframes have a common column name ‘name’ which is the key.

The common data between X and Y are marked in yellow color. The uncommon data present in dataframe X are marked in green color. All these rows are the result of merging by left outer join. But the uncommon data marked in green will have empty data under the unique column names that are from the Y dataframe, which are marked in pink.

Right Outer join:

The right outer join takes the rows that have the same data between the two dataframes X and Y under the key column, and also all the rows with uncommon data under the key column assigned to the ‘on’ parameter from the Y dataframe.
Here is the code snippet for demonstrating right outer join in merging.

The input dataframes have a common column name ‘name’ which is the key marked in red color.

The common data between X and Y are marked in yellow color. The uncommon data present in dataframe Y are marked in green color. All these rows are the result of merging by right outer join. But the uncommon data marked in green will have empty data under the unique column names that are from the Y dataframe, which are marked in pink. The output is below.

Thanks for reading!