DEV Community

Cover image for How to merge multiple Excel files using Python
Luca Liu
Luca Liu

Posted on • Updated on

How to merge multiple Excel files using Python

In today's data-driven landscape, consolidating information scattered across multiple Excel files is a common challenge. Fortunately, Python provides powerful tools to streamline this process, offering a more efficient way to merge and organize data. In this guide, we'll explore how to leverage Python's capabilities to effortlessly combine multiple Excel files into a cohesive and structured dataset.

In the world of data, things get messy when you have multiple Excel files with the same columns. If you're looking to tidy up and merge them all into one file, especially when you need to go through them one by one, you're in the right place. This guide will show you an easy way to make sense of it all and keep your data organized.

Python Solution

Merge multiple Excel files effortlessly with this Python code! πŸš€ Using pandas and os, the script navigates through files in a specified folder, combining them into a neat merged_excel.xlsx file. Just plug in your folder path, run the code, and voila – streamlined data! πŸ“ŠπŸ’»

# import packages
import pandas as pd
import os

# Define a function 'append' to merge Excel files in a specified path
def append(path):
    # Create an empty list to store individual DataFrames    
    frames = []    
    for root, dirs, files in os.walk(path):        
        for file in files:            
            file_with_path = os.path.join(root, file)          
            df = pd.read_excel(file_with_path)      
            frames.append(df)    
    df = pd.concat(frames, axis=0)    
    return df

# path:The folder path where storage all the excel files 
df = append(path) 
df.to_excel("merged_excel.xlsx")
Enter fullscreen mode Exit fullscreen mode

In this code snippet, we're using two powerful tools: pandas and os (a module for working with the operating system).

The append function is the star here. It digs through all the Excel files in a specified folder ('path') and collects them into a DataFrame, which is like a neat table for our data.

Now, for the magic moment: the last two lines! They use our append function to merge all the Excel data in the specified folder into one consolidated file called merged_excel.xlsx.


Bonus: Handling Unique Data Situations

While we've explored solutions for straightforward data merging, let's navigate a slight detour to address more nuanced scenarios.

Picture this: your Excel files carry distinct dates in their names, demanding not just consolidation but a thoughtful integration into a final summary so that you can know the date of data in the final version.

Situation 1: Date-Driven Excel Filenames: Crafting a Comprehensive Summary Excel

For example, if the file name follows a pattern like "filename_YYYY-MM-DD.xlsx", and you want to extract both the date and the filename, you can adapt the code as follows:

import pandas as pd
import os

def append(path):
    frames = []  # Create an empty list to store individual DataFrames
    for root, dirs, files in os.walk(path):
        for file in files:
            file_with_path = os.path.join(root, file)

            # Extract filename and date information from the file name
            file_info = os.path.splitext(file)[0].split('_')

            # Read each Excel file using pandas
            df = pd.read_excel(file_with_path)

            # Add filename and date columns to the DataFrame
            df["filename"] = file_info[0]  # Assuming filename is the first part
            df["date"] = pd.to_datetime(file_info[1])  # Assuming date is the second part

            # Append the DataFrame to the list
            frames.append(df)

    # Concatenate all DataFrames in the list along the rows
    df = pd.concat(frames, axis=0)

    return df

Enter fullscreen mode Exit fullscreen mode

Situation 2: Streamlining Data: Merging Multiple Excel Files into One with Distinct Sheets Named After Each File

Now, let's pivot to another scenario. Suppose your goal is a consolidated master file, but with a twist – each file contributes to a separate sheet, named after its original file.

import pandas as pd
import os

def combine(path):
    with pd.ExcelWriter("merged_excel.xlsx") as writer:
        for root, dirs, files in os.walk(path):
            for file in files:
                filename = os.path.join(root, file)
                df = pd.read_excel(filename)
                # Delete the file name suffix, sometimes it could be xlsv/.xlsx
                df.to_excel(writer, sheet_name=file.strip(
                    '.csv'))  
        return df
Enter fullscreen mode Exit fullscreen mode

Explore more

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.

πŸš€ Connect with me on LinkedIn

πŸŽƒ Connect with me on X

🌍 Connect with me on Instagram

Top comments (0)