Notice: Constant automattic\jetpack\extensions\social_previews\FEATURE_NAME already defined in /home/beaczwhx/dggulaitutorial.com/wp-content/plugins/jetpack/extensions/blocks/social-previews/social-previews.php on line 14
How To Merge Pandas DataFrames In Python - Dggul AI Tutorial

How To Merge Pandas DataFrames In Python


Notice: Undefined index: limited_lang in /home/beaczwhx/dggulaitutorial.com/wp-content/plugins/code-syntax-highlighter/inc/src/rendrer.php on line 297

Hello coders, welcome to my new tutorial how to merge pandas dataframes. In this tutorial, you will learn to merge pandas dataframes in python.

We can merge pandas dataframes in various ways. Some of them are following –

  • merging pandas dataframes on index
  • merging pandas dataframes on columns
  • merging pandas dataframes on two columns
  • merging dataframe using ‘how’ Argument

So if you are eager to know your answer of query – how do I merge dataframes, then read this tutorial till the end. Now it’s time to get start pandas merge tutorial.

How To Merge Pandas Dataframes In Python

Pandas provide merge() function which is used to merge pandas dataframes. Here is the syntax to merge pandas dataframes.

pd.merge(
    left,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=True,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)
  • left – A DataFrame object
  • right – Another dataframe object
  • how – One of ‘left’, ‘right’, ‘outer’, ‘inner’. Defaults to inner
  • on – Columns (names) to join on. Must be found in both the left and right DataFrame objects.
  • left_on – Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
  • right_on – Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
  • left_index – If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.
  • right_index – Same usage as left_index for the right DataFrame.
  • sort – Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve performance substantially in many cases.
  • suffixes – A tuple of string suffixes to apply to overlapping columns. Defaults to ('_x', '_y').
  • copy – Always copy data (default True) from the passed DataFrame objects, even when reindexing is not necessary.
  • indicator – Add a column to the output DataFrame called _merge with information on the source of each row.
  • validate – string, default None. If specified, checks if merge is of specified type.

Merge Two DataFrames on a Key

Here we have taken an example to merge two dataframes on a key. Let’s see how to merge two dataframes pandas with same column names.

import pandas as pd

df1 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Name': ['Alexa', 'Helen', 'Allen', 'Joy']
})

df2 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']
})

df3=pd.merge(df1,df2,on="emp_id")
df3
  • Here we have created two dataframes df1 and df2.
  • merge() function is used to merge these dataframes. Here we have used the on parameter. The dataframes are merged on the key emp_id.

Output

how to merge pandas dataframes

Merge pandas dataframes on index

We can merge two dataframes on index by passing two parameters left_index and right_index to the merge() function. So let’s see how dataframes in pandas merge on index.

import pandas as pd
df1 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Name': ['Alexa', 'Helen', 'Allen', 'Joy']
    
},index=list('abcd'))

df2 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']
   
},index=list('abef'))

df3=pd.merge(df1,df2,on="emp_id",left_index=True, right_index=True)
df3
  • Here we have set value of left_index and right_index to True.

Output

how to merge pandas dataframes

  • The merge() function performs an inner join by default, so only the indexes that appear in both DataFrames are kept.

Merge pandas dataframes on two columns

To merge dataframes on two columns, we have to pass the columns to merge on as a list to the on parameter of the merge() function. The syntax is following :

df = pd.merge(left, right, on=['Col1', 'Col2'], how='inner')

Let’s take an example of pandas merge on multiple columns.

import pandas as pd
df1 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Name': ['Alexa', 'Helen', 'Allen', 'Joy']
    
})

df2 = pd.DataFrame({
   'emp_id':[101,102,103,104],
    'Name': ['Alexa', 'Keller', 'Allen', 'Bob'],
   'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']
   
})

df3=pd.merge(df1,df2,on=["emp_id","Name"],how="inner")
df3
  • Here we have passed two columns emp_id and Name to the parameter on.

Output

how to merge pandas dataframes

Merge multiple pandas dataframes

We can merge multiple dataframes pandas in following way.

import pandas as pd
df1 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Name': ['Alexa', 'Helen', 'Allen', 'Joy']
    
})

df2 = pd.DataFrame({
   'emp_id':[101,102,103,104],
    'Salary': [10000,20000,20000,60000],
   'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']
   
})
df3 = pd.DataFrame({
   'emp_id':[101,102,103,104],
    'Designation': ['HR', 'Manager', 'Trainer', 'Coach'],
   
   
})

df4=pd.merge(pd.merge(df1,df2,on="emp_id"),df3,on="emp_id")

df4
  • Here we have created three dataframes and merged them on key emp_id.

Output

how to merge pandas dataframes

How To Merge Dataframes Using “how” Argument.

The how argument to merge specifies how to determine which keys are to be included in the resulting dataframe. If a key combination does not appear in either the left or right dataframes, the values in the joined dataframe will be NAN.

Here is a summary of the how options and their SQL equivalent names:

Merge method SQL Join Name Description
left LEFT OUTER JOIN Use keys from left dataframe only
right RIGHT OUTER JOIN Use keys from right dataframe only
outer FULL OUTER JOIN Use union of keys from both dataframes
inner INNER JOIN Use intersection of keys from both dataframes
cross CROSS JOIN Create the cartesian product of rows of both dataframes

Let’s see them with examples.

Left Outer Join

When we specify left option to how argument, it use keys from left dataframe only.

import pandas as pd
df1 = pd.DataFrame({
'emp_id':[101,102,103,104],
'Name': ['Alexa', 'Helen', 'Allen', 'Joy']

})

df2 = pd.DataFrame({
'emp_id':[101,102,103,105],
'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']

})

df3=pd.merge(df1,df2,on="emp_id", how="left")
df3

Left will take all the common elements between two dataframes and also the remaining elements from the left dataframe.

Output

how to merge pandas dataframes

Right Outer Join

When we specify right option to how argument, it use keys from right dataframe only.

import pandas as pd


df1 = pd.DataFrame({
'emp_id':[101,102,103,104],
'Name': ['Alexa', 'Helen', 'Allen', 'Joy']

})

df2 = pd.DataFrame({
'emp_id':[101,102,103,105],
'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']

})

df3=pd.merge(df1,df2,on="emp_id", how="right")
df3

Right will take all the common elements between two dataframes and also the remaining elements from the right dataframe.

how to merge pandas dataframes

Full Outer Join

When we specify outer option to how argument, it use union of keys from both dataframes.

import pandas as pd
df1 = pd.DataFrame({
'emp_id':[101,102,103,104],
'Name': ['Alexa', 'Helen', 'Allen', 'Joy']

})

df2 = pd.DataFrame({
'emp_id':[101,102,103,105],
'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']

})

df3=pd.merge(df1,df2,on="emp_id", how="outer")
df3

Output

Inner Join

When we specify outer option to how argument, it use intersection of keys from both dataframes.

import pandas as pd
df1 = pd.DataFrame({
   'emp_id':[101,102,103,104],
   'Name': ['Alexa', 'Helen', 'Allen', 'Joy']
    
})

df2 = pd.DataFrame({
   'emp_id':[101,102,103,105],
   'Address': ['Newyork', 'London', 'Manchester', 'Mumbai']
   
})

df3=pd.merge(df1,df2,on="emp_id", how="inner")
df3

Output

So guys, it was all about how to merge pandas dataframes tutorial. I hope you have understand it very well but if still if you have any doubt regarding this tutorial then feel free to ask your questions in comment box. Stay tuned with Dggul AI Tutorial for upcoming tutorials. Happy Coding !!!

 

Check Other Pandas Tutorials…

Leave a Comment