Notice: Undefined index: limited_lang in /home/beaczwhx/dggulaitutorial.com/wp-content/plugins/code-syntax-highlighter/inc/src/rendrer.php on line 297
Hello coders, welcome to my new tutorial how to merge pandas dataframes. In this tutorial, you will learn to merge pandas dataframes in python.
We can merge pandas dataframes in various ways. Some of them are following –
- merging pandas dataframes on index
- merging pandas dataframes on columns
- merging pandas dataframes on two columns
- merging dataframe using ‘how’ Argument
So if you are eager to know your answer of query – how do I merge dataframes, then read this tutorial till the end. Now it’s time to get start pandas merge tutorial.
How To Merge Pandas Dataframes In Python
Pandas provide merge() function which is used to merge pandas dataframes. Here is the syntax to merge pandas dataframes.
pd.merge( left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, )
- left – A DataFrame object
- right – Another dataframe object
- how – One of ‘left’, ‘right’, ‘outer’, ‘inner’. Defaults to inner
- on – Columns (names) to join on. Must be found in both the left and right DataFrame objects.
- left_on – Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
- right_on – Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
- left_index – If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.
- right_index – Same usage as left_index for the right DataFrame.
- sort – Sort the result DataFrame by the join keys in lexicographical order. Defaults to
True
, setting toFalse
will improve performance substantially in many cases. - suffixes – A tuple of string suffixes to apply to overlapping columns. Defaults to
('_x', '_y')
. - copy – Always copy data (default
True
) from the passed DataFrame objects, even when reindexing is not necessary. - indicator – Add a column to the output DataFrame called
_merge
with information on the source of each row. - validate – string, default None. If specified, checks if merge is of specified type.
Merge Two DataFrames on a Key
Here we have taken an example to merge two dataframes on a key. Let’s see how to merge two dataframes pandas with same column names.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3=pd.merge(df1,df2,on="emp_id") df3
- Here we have created two dataframes df1 and df2.
- merge() function is used to merge these dataframes. Here we have used the on parameter. The dataframes are merged on the key emp_id.
Output
Merge pandas dataframes on index
We can merge two dataframes on index by passing two parameters left_index and right_index to the merge() function. So let’s see how dataframes in pandas merge on index.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] },index=list('abcd')) df2 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] },index=list('abef')) df3=pd.merge(df1,df2,on="emp_id",left_index=True, right_index=True) df3
- Here we have set value of left_index and right_index to True.
Output
- The merge() function performs an inner join by default, so only the indexes that appear in both DataFrames are kept.
Merge pandas dataframes on two columns
To merge dataframes on two columns, we have to pass the columns to merge on as a list to the on parameter of the merge() function. The syntax is following :
df = pd.merge(left, right, on=['Col1', 'Col2'], how='inner')
Let’s take an example of pandas merge on multiple columns.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Keller', 'Allen', 'Bob'], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3=pd.merge(df1,df2,on=["emp_id","Name"],how="inner") df3
- Here we have passed two columns emp_id and Name to the parameter on.
Output
Merge multiple pandas dataframes
We can merge multiple dataframes pandas in following way.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Salary': [10000,20000,20000,60000], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Designation': ['HR', 'Manager', 'Trainer', 'Coach'], }) df4=pd.merge(pd.merge(df1,df2,on="emp_id"),df3,on="emp_id") df4
- Here we have created three dataframes and merged them on key emp_id.
Output
How To Merge Dataframes Using “how” Argument.
The how argument to merge specifies how to determine which keys are to be included in the resulting dataframe. If a key combination does not appear in either the left or right dataframes, the values in the joined dataframe will be NAN.
Here is a summary of the how options and their SQL equivalent names:
Merge method | SQL Join Name | Description |
---|---|---|
left |
LEFT OUTER JOIN |
Use keys from left dataframe only |
right |
RIGHT OUTER JOIN |
Use keys from right dataframe only |
outer |
FULL OUTER JOIN |
Use union of keys from both dataframes |
inner |
INNER JOIN |
Use intersection of keys from both dataframes |
cross |
CROSS JOIN |
Create the cartesian product of rows of both dataframes |
Let’s see them with examples.
Left Outer Join
When we specify left option to how argument, it use keys from left dataframe only.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,105], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3=pd.merge(df1,df2,on="emp_id", how="left") df3
Left will take all the common elements between two dataframes and also the remaining elements from the left dataframe.
Output
Right Outer Join
When we specify right option to how argument, it use keys from right dataframe only.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,105], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3=pd.merge(df1,df2,on="emp_id", how="right") df3
Right will take all the common elements between two dataframes and also the remaining elements from the right dataframe.
Full Outer Join
When we specify outer option to how argument, it use union of keys from both dataframes.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,105], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3=pd.merge(df1,df2,on="emp_id", how="outer") df3
Output
Inner Join
When we specify outer option to how argument, it use intersection of keys from both dataframes.
import pandas as pd df1 = pd.DataFrame({ 'emp_id':[101,102,103,104], 'Name': ['Alexa', 'Helen', 'Allen', 'Joy'] }) df2 = pd.DataFrame({ 'emp_id':[101,102,103,105], 'Address': ['Newyork', 'London', 'Manchester', 'Mumbai'] }) df3=pd.merge(df1,df2,on="emp_id", how="inner") df3
Output
So guys, it was all about how to merge pandas dataframes tutorial. I hope you have understand it very well but if still if you have any doubt regarding this tutorial then feel free to ask your questions in comment box. Stay tuned with Dggul AI Tutorial for upcoming tutorials. Happy Coding !!!
Check Other Pandas Tutorials…
- How To Create Dataframe In Python Using Pandas
- How To Read And Write CSV File In Python Pandas
- How To Create Pandas Dataframe From List In Python
- How To Create Dataframe From CSV File In Python
- How To Create Python Pandas Dataframe From Numpy Array
- How To Create Pandas Series