Working with Excel files is one of the most common tasks in data analysis, and Pandas makes it incredibly easy — but only after you set things up correctly.
If you’ve ever tried running pd.read_excel() and gotten errors like:
- “ImportError: Missing optional dependency ‘openpyxl’”
- “FileNotFoundError”
- “UnicodeDecodeError”
- “xlrd does not support Excel xlsx files”
- “Engine not recognized”
…then this post is for you.
Before you can successfully read an Excel file using Pandas, there are several requirements, checks, and configurations you must ensure are in place. This article walks you through all of them.
1. Install Pandas
Pandas does not come preinstalled with Python.
To install it:
pip install pandas
To confirm it installed correctly:
import pandas as pd
print(pd.__version__)
2. Install the Correct Excel Engine (Most Important Step!)
Pandas needs an external library to read Excel files.
Depending on your file type, you must install the right engine.
✔ For .xlsx files
Most modern Excel files use this format.
Install openpyxl:
pip install openpyxl
Pandas uses openpyxl by default when reading .xlsx files.
✔ For .xls files (older Excel format)
Install xlrd 1.2.0 (newer versions don’t support .xls):
pip install xlrd==1.2.0
✔ For .xlsm files (Excel with macros)
pip install openpyxl
✔ For writing Excel files (.xlsx)
Install XlsxWriter:
pip install XlsxWriter
3. Verify That the File Actually Exists
One of the most common mistakes is using a wrong file path.
✔ If the file is in the same folder as your script or notebook:
df = pd.read_excel("data.xlsx")
✔ If it’s in a subfolder:
df = pd.read_excel("datasets/data.xlsx")
✔ If the path contains spaces:
df = pd.read_excel(r"C:\Users\Kenechukwu\Documents\Water Quality Data.xlsx")
Tips:
- Print the directory:
import os; os.getcwd() - Use absolute paths if unsure
- Ensure filename and extension are 100% correct
4. Check That the File Is Not Open in Excel
Excel locks files while open.
This can cause:
PermissionError: [Errno 13] Permission denied
✔ Close the Excel app
✔ Re-run your script
5. Check the File Format for Corruption
Sometimes Pandas errors happen because the Excel file is:
- corrupted
- incorrectly saved
- missing headers
- filled with merged cells
Try opening it manually in Excel first.
6. Use the Correct Sheet Name
If your Excel file has multiple sheets:
df = pd.read_excel("data.xlsx", sheet_name="Customers")
To see all sheet names:
import pandas as pd
xls = pd.ExcelFile("data.xlsx")
print(xls.sheet_names)
7. Handle Header Rows Properly
If your file has no header row:
df = pd.read_excel("data.xlsx", header=None)
If header is on row 2:
df = pd.read_excel("data.xlsx", header=1)
8. Handle Large Excel Files
Large files may crash your notebook.
✔ Use usecols to load only needed columns:
df = pd.read_excel("data.xlsx", usecols="A:D")
✔ Or read in chunks:
df_iter = pd.read_excel("data.xlsx", chunksize=5000)
9. Use the Correct Data Types
Sometimes numbers load as strings and cause errors.
df["Age"] = df["Age"].astype(int)
df["Date"] = pd.to_datetime(df["Date"])
10. Handle Missing Values
Excel files often contain blanks.
df.fillna(0, inplace=True)
df.dropna()
Putting It All Together
Once everything above is ready, your final code looks like this:
import pandas as pd
df = pd.read_excel(
"WaterQuality.xlsx",
engine="openpyxl",
sheet_name="Readings"
)
print(df.head())
Conclusion
Reading Excel files with Pandas is simple — but only if the right setup is in place. The checklist is:
✔ Install Pandas
✔ Install the right Excel engine (openpyxl, xlrd)
✔ Provide the correct file path
✔ Ensure the Excel file is closed
✔ Confirm sheet names and header rows
✔ Handle large files and data types
✔ Clean missing values as needed
Once these steps are covered, Pandas becomes a powerful tool for exploring and analyzing structured Excel data.
If you'd like, I can also write:
✅ A video script version
✅ A simplified beginner guide
✅ An advanced guide for analysts and data engineers
Top comments (0)