DEV Community

Kenechukwu Anoliefo
Kenechukwu Anoliefo

Posted on

Everything You Need to Do Before Successfully Reading an Excel File with Pandas

Working with Excel files is one of the most common tasks in data analysis, and Pandas makes it incredibly easy — but only after you set things up correctly.
If you’ve ever tried running pd.read_excel() and gotten errors like:

  • “ImportError: Missing optional dependency ‘openpyxl’”
  • “FileNotFoundError”
  • “UnicodeDecodeError”
  • “xlrd does not support Excel xlsx files”
  • “Engine not recognized”

…then this post is for you.

Before you can successfully read an Excel file using Pandas, there are several requirements, checks, and configurations you must ensure are in place. This article walks you through all of them.


1. Install Pandas

Pandas does not come preinstalled with Python.
To install it:

pip install pandas
Enter fullscreen mode Exit fullscreen mode

To confirm it installed correctly:

import pandas as pd
print(pd.__version__)
Enter fullscreen mode Exit fullscreen mode

2. Install the Correct Excel Engine (Most Important Step!)

Pandas needs an external library to read Excel files.
Depending on your file type, you must install the right engine.

✔ For .xlsx files

Most modern Excel files use this format.

Install openpyxl:

pip install openpyxl
Enter fullscreen mode Exit fullscreen mode

Pandas uses openpyxl by default when reading .xlsx files.


✔ For .xls files (older Excel format)

Install xlrd 1.2.0 (newer versions don’t support .xls):

pip install xlrd==1.2.0
Enter fullscreen mode Exit fullscreen mode

✔ For .xlsm files (Excel with macros)

pip install openpyxl
Enter fullscreen mode Exit fullscreen mode

✔ For writing Excel files (.xlsx)

Install XlsxWriter:

pip install XlsxWriter
Enter fullscreen mode Exit fullscreen mode

3. Verify That the File Actually Exists

One of the most common mistakes is using a wrong file path.

✔ If the file is in the same folder as your script or notebook:

df = pd.read_excel("data.xlsx")
Enter fullscreen mode Exit fullscreen mode

✔ If it’s in a subfolder:

df = pd.read_excel("datasets/data.xlsx")
Enter fullscreen mode Exit fullscreen mode

✔ If the path contains spaces:

df = pd.read_excel(r"C:\Users\Kenechukwu\Documents\Water Quality Data.xlsx")
Enter fullscreen mode Exit fullscreen mode

Tips:

  • Print the directory: import os; os.getcwd()
  • Use absolute paths if unsure
  • Ensure filename and extension are 100% correct

4. Check That the File Is Not Open in Excel

Excel locks files while open.
This can cause:

PermissionError: [Errno 13] Permission denied
Enter fullscreen mode Exit fullscreen mode

✔ Close the Excel app
✔ Re-run your script


5. Check the File Format for Corruption

Sometimes Pandas errors happen because the Excel file is:

  • corrupted
  • incorrectly saved
  • missing headers
  • filled with merged cells

Try opening it manually in Excel first.


6. Use the Correct Sheet Name

If your Excel file has multiple sheets:

df = pd.read_excel("data.xlsx", sheet_name="Customers")
Enter fullscreen mode Exit fullscreen mode

To see all sheet names:

import pandas as pd
xls = pd.ExcelFile("data.xlsx")
print(xls.sheet_names)
Enter fullscreen mode Exit fullscreen mode

7. Handle Header Rows Properly

If your file has no header row:

df = pd.read_excel("data.xlsx", header=None)
Enter fullscreen mode Exit fullscreen mode

If header is on row 2:

df = pd.read_excel("data.xlsx", header=1)
Enter fullscreen mode Exit fullscreen mode

8. Handle Large Excel Files

Large files may crash your notebook.

✔ Use usecols to load only needed columns:

df = pd.read_excel("data.xlsx", usecols="A:D")
Enter fullscreen mode Exit fullscreen mode

✔ Or read in chunks:

df_iter = pd.read_excel("data.xlsx", chunksize=5000)
Enter fullscreen mode Exit fullscreen mode

9. Use the Correct Data Types

Sometimes numbers load as strings and cause errors.

df["Age"] = df["Age"].astype(int)
df["Date"] = pd.to_datetime(df["Date"])
Enter fullscreen mode Exit fullscreen mode

10. Handle Missing Values

Excel files often contain blanks.

df.fillna(0, inplace=True)
df.dropna()
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

Once everything above is ready, your final code looks like this:

import pandas as pd

df = pd.read_excel(
    "WaterQuality.xlsx",
    engine="openpyxl",
    sheet_name="Readings"
)

print(df.head())
Enter fullscreen mode Exit fullscreen mode

Conclusion

Reading Excel files with Pandas is simple — but only if the right setup is in place. The checklist is:

✔ Install Pandas
✔ Install the right Excel engine (openpyxl, xlrd)
✔ Provide the correct file path
✔ Ensure the Excel file is closed
✔ Confirm sheet names and header rows
✔ Handle large files and data types
✔ Clean missing values as needed

Once these steps are covered, Pandas becomes a powerful tool for exploring and analyzing structured Excel data.


If you'd like, I can also write:
✅ A video script version
✅ A simplified beginner guide
✅ An advanced guide for analysts and data engineers

Top comments (0)