Kenechukwu Anoliefo

Posted on Nov 16, 2025

Everything You Need to Do Before Successfully Reading an Excel File with Pandas

#mlzoomcamp #machinelearning

Working with Excel files is one of the most common tasks in data analysis, and Pandas makes it incredibly easy — but only after you set things up correctly.
If you’ve ever tried running pd.read_excel() and gotten errors like:

“ImportError: Missing optional dependency ‘openpyxl’”
“FileNotFoundError”
“UnicodeDecodeError”
“xlrd does not support Excel xlsx files”
“Engine not recognized”

…then this post is for you.

Before you can successfully read an Excel file using Pandas, there are several requirements, checks, and configurations you must ensure are in place. This article walks you through all of them.

1. Install Pandas

Pandas does not come preinstalled with Python.
To install it:

pip install pandas

To confirm it installed correctly:

import pandas as pd
print(pd.__version__)

2. Install the Correct Excel Engine (Most Important Step!)

Pandas needs an external library to read Excel files.
Depending on your file type, you must install the right engine.

✔ For `.xlsx` files

Most modern Excel files use this format.

Install openpyxl:

pip install openpyxl

Pandas uses openpyxl by default when reading .xlsx files.

✔ For `.xls` files (older Excel format)

Install xlrd 1.2.0 (newer versions don’t support .xls):

pip install xlrd==1.2.0

✔ For `.xlsm` files (Excel with macros)

pip install openpyxl

✔ For writing Excel files (.xlsx)

Install XlsxWriter:

pip install XlsxWriter

3. Verify That the File Actually Exists

One of the most common mistakes is using a wrong file path.

✔ If the file is in the same folder as your script or notebook:

df = pd.read_excel("data.xlsx")

✔ If it’s in a subfolder:

df = pd.read_excel("datasets/data.xlsx")

✔ If the path contains spaces:

df = pd.read_excel(r"C:\Users\Kenechukwu\Documents\Water Quality Data.xlsx")

Tips:

Print the directory: import os; os.getcwd()
Use absolute paths if unsure
Ensure filename and extension are 100% correct

4. Check That the File Is Not Open in Excel

Excel locks files while open.
This can cause:

PermissionError: [Errno 13] Permission denied

✔ Close the Excel app
✔ Re-run your script

5. Check the File Format for Corruption

Sometimes Pandas errors happen because the Excel file is:

corrupted
incorrectly saved
missing headers
filled with merged cells

Try opening it manually in Excel first.

6. Use the Correct Sheet Name

If your Excel file has multiple sheets:

df = pd.read_excel("data.xlsx", sheet_name="Customers")

To see all sheet names:

import pandas as pd
xls = pd.ExcelFile("data.xlsx")
print(xls.sheet_names)

7. Handle Header Rows Properly

If your file has no header row:

df = pd.read_excel("data.xlsx", header=None)

If header is on row 2:

df = pd.read_excel("data.xlsx", header=1)

8. Handle Large Excel Files

Large files may crash your notebook.

✔ Use usecols to load only needed columns:

df = pd.read_excel("data.xlsx", usecols="A:D")

✔ Or read in chunks:

df_iter = pd.read_excel("data.xlsx", chunksize=5000)

9. Use the Correct Data Types

Sometimes numbers load as strings and cause errors.

df["Age"] = df["Age"].astype(int)
df["Date"] = pd.to_datetime(df["Date"])

10. Handle Missing Values

Excel files often contain blanks.

df.fillna(0, inplace=True)
df.dropna()

Putting It All Together

Once everything above is ready, your final code looks like this:

import pandas as pd

df = pd.read_excel(
    "WaterQuality.xlsx",
    engine="openpyxl",
    sheet_name="Readings"
)

print(df.head())

Conclusion

Reading Excel files with Pandas is simple — but only if the right setup is in place. The checklist is:

✔ Install Pandas
✔ Install the right Excel engine (openpyxl, xlrd)
✔ Provide the correct file path
✔ Ensure the Excel file is closed
✔ Confirm sheet names and header rows
✔ Handle large files and data types
✔ Clean missing values as needed

Once these steps are covered, Pandas becomes a powerful tool for exploring and analyzing structured Excel data.

If you'd like, I can also write:
✅ A video script version
✅ A simplified beginner guide
✅ An advanced guide for analysts and data engineers

DEV Community

Everything You Need to Do Before Successfully Reading an Excel File with Pandas

1. Install Pandas

2. Install the Correct Excel Engine (Most Important Step!)

✔ For `.xlsx` files

✔ For `.xls` files (older Excel format)

✔ For `.xlsm` files (Excel with macros)

✔ For writing Excel files (.xlsx)

3. Verify That the File Actually Exists

Tips:

4. Check That the File Is Not Open in Excel

5. Check the File Format for Corruption

6. Use the Correct Sheet Name

7. Handle Header Rows Properly

8. Handle Large Excel Files

9. Use the Correct Data Types

10. Handle Missing Values

Putting It All Together

Conclusion

Top comments (0)

1. Install Pandas

2. Install the Correct Excel Engine (Most Important Step!)

✔ For .xlsx files

✔ For .xls files (older Excel format)

✔ For .xlsm files (Excel with macros)

✔ For writing Excel files (.xlsx)

3. Verify That the File Actually Exists

Tips:

4. Check That the File Is Not Open in Excel

5. Check the File Format for Corruption

6. Use the Correct Sheet Name

7. Handle Header Rows Properly

8. Handle Large Excel Files

9. Use the Correct Data Types

10. Handle Missing Values

Putting It All Together

Conclusion

✔ For `.xlsx` files

✔ For `.xls` files (older Excel format)

✔ For `.xlsm` files (Excel with macros)