DEV Community

Cover image for Automation - Using Python to Interact with Excel
Techelopment
Techelopment

Posted on

Automation - Using Python to Interact with Excel

Interacting with Excel files is a common need in many industries, from data analysis to reporting. Python, thanks to powerful and versatile libraries such as pandas, openpyxl and xlrd, offers a fast and efficient way to read, write and manipulate Excel files.

In this article, we will see how to use the openpyxl library to interact with Excel and what techniques we can adopt to automate data analysis and manipulation tasks.

🔗 Do you like Techelopment? Check out the site for all the details!

Introduction

The goal of this article is to show how, using Python, you can simplify and automate repetitive or complex tasks that would otherwise be time-consuming if done manually in Excel. While many of the tasks described here can be performed directly in Excel, using Python opens the door to greater flexibility and automation, making it possible to manipulate large amounts of data, create automated reports, and manage complex processes in just a few steps.

The following article lays the foundation for automating and manipulating Excel files with Python. Once you have mastered these concepts, it will be easy to expand the scripts to suit your needs, adding custom functionality or combining different tools to achieve even more powerful results.

Example excel file

The scripts that we will see in this article will refer to an excel file that contains the list of tasks of a work group. The file, called "team_tasks", is structured as shown in the image below and will contain 50 tasks:

Excel file structureon

Setup (Windows)

Let's start by installing the library that will allow us to work with the Excel file. Open the terminal (Win+r type cmd and then Enter) and run the following command:

pip install openpyxl
Enter fullscreen mode Exit fullscreen mode

At this point we are ready to open our favorite IDE and start writing our python code, so let's define the library import:

from openpyxl import load_workbook
Enter fullscreen mode Exit fullscreen mode

Reading an Excel file

In this first script we will see how to instantiate the openpyxl library object that will represent our excel file. This will allow us to open the file for reading. After that we will print the first 15 tasks to the screen:

from openpyxl import load_workbook

xlsx_file_name = "team_tasks.xlsx"

print(f"Reading xlsx file '{xlsx_file_name }'...")
wb = load_workbook(filename='C:\\Temp\\' + xlsx_file_name )
print("Read ok")

ws = wb.active

my_rows = tuple(ws.rows)

dictOfTasks = dict()
for row in my_rows[1:15]:  #ignore first row related to the header
    taskId = row[0].value
    taskInfo = {"Priority": row[1].value, "Title": row[2].value, "Description": row[3].value, "Due Date": row[4].value, "Owner":row[5].value}
    dictOfTasks[taskId] = taskInfo

for taskIdKey in dictOfTasks:
        print("Task id:", taskIdKey)
        print(f" - Title: {dictOfTasks[taskIdKey]['Title']}\n"
              f" - Description: {dictOfTasks[taskIdKey]['Description']}\n"
              f" - Due date: {dictOfTasks[taskIdKey]['Due Date']}\n"
              f" - Priority: {dictOfTasks[taskIdKey]['Priority']}\n"
              f" - Owner: {dictOfTasks[taskIdKey]['Owner']}\n")

input("\nHit Enter to exit...")
Enter fullscreen mode Exit fullscreen mode
  • load_workbook allows us to open the file "team_tasks.xlsx" for reading
  • wb.active retrieves the first sheet of the file
  • tuple(ws.rows) retrieves all the rows of the excel and creates a tuple
  • dictOfTasks we use a dictionary to manipulate the data. The dictionary structure will be organized in the following way - for each taskid the related information. The dictionary is built only for the first 15 tasks (from row 1 to row 15 - row 0 represents the header but we will not consider it in these examples):
dictOfTasks = { taskId: {
                  "Priority": "task priority", 
                  "Title": "task title", 
                  "Description": "task description",
                  "Due Date": "task due date",
                },
                ...
              }
Enter fullscreen mode Exit fullscreen mode
  • the for loop iterates through the dictionary to print information for each task:
#Output

Reading xlsx file 'team_tasks.xlsx'...
Read ok
Task id: ID1
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID2
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID3
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: High
 - Owner: Developer 2

Task id: ID4
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: High
 - Owner: Developer 2

Task id: ID5
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: Developer 2

Task id: ID6
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID7
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID8
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID9
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 04 Nov
 - Priority: Medium
 - Owner: PM 

Task id: ID10
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 11 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID11
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 12 Nov
 - Priority: Medium
 - Owner: Developer 2

Task id: ID12
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 13 Nov
 - Priority: Medium
 - Owner: Developer 2

Task id: ID13
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 14 Nov
 - Priority: Medium
 - Owner: Developer 1

Task id: ID14
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 15 Nov
 - Priority: Medium
 - Owner: Developer 3

Task id: ID15
 - Title: Lorem ipsum
 - Description: dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
 - Due date: 16 Nov
 - Priority: Medium
 - Owner: Developer 3

Hit Enter to exit...
Enter fullscreen mode Exit fullscreen mode

Retrieving tasks assigned to a team member

Now suppose we are "Developer 2", our boss sends us the file "tasks_team.xlsx" and we want to automatically extract the tasks assigned to us sorted by priority. To do this we define 2 data structures that will help us identify the tasks assigned to "Developer 2" and perform the priority sorting:

list_owner = ["Developer 2"]

#define the priority sort information
priority_order = {'Urgent': 0, 'Important': 1, 'High': 2, 'Medium': 3}
Enter fullscreen mode Exit fullscreen mode

We also add a dictionary to collect information about our tasks:

my_tasks = {} #dict of my tasks
Enter fullscreen mode Exit fullscreen mode

The final script will be this:

from openpyxl import load_workbook

list_owner = ["Developer 2"]

xlsx_file_name = "team_tasks.xlsx"

print(f"Reading xlsx file '{xlsx_file_name }'...")
wb = load_workbook(filename='C:\\Temp\\' + xlsx_file_name )
print("Read ok")

ws = wb.active

my_rows = tuple(ws.rows)

dictOfTasks = dict()
for row in my_rows[1:]:  #get all rows except first row related to the header
    taskId = row[0].value
    taskInfo = {"Priority": row[1].value, "Title": row[2].value, "Description": row[3].value, "Due Date": row[4].value, "Owner":row[5].value}
    dictOfTasks[taskId] = taskInfo

my_tasks = {} #dict of my tasks

#find all tasks of Developer 2
for taskIdKey in dictOfTasks:
    if dictOfTasks[taskIdKey]['Owner'] in list_owner:
        my_tasks[taskIdKey] = dictOfTasks[taskIdKey]

#shows the total tasks of Developer 2
print("\nMy tasks - Total:", len(my_tasks))

print("\nMy tasks by Priority:")

#define the priority sort information
priority_order = {'Urgent': 0, 'Important': 1, 'High': 2, 'Medium': 3}

#sort tasks by priority
sorted_tasks = dict(sorted(my_tasks.items(), key=lambda x: priority_order[x[1]['Priority']]))

#print the tasks info
for my_task_id in sorted_tasks:
    print(f"{my_task_id} [{sorted_tasks[my_task_id]["Priority"]}]: \n"
          f" - Title: {sorted_tasks[my_task_id]["Title"]}\n"
          f" - Due Date: {sorted_tasks[my_task_id]["Due Date"]}")

input("\nHit Enter to exit...")
Enter fullscreen mode Exit fullscreen mode

The output will show all the info to organize our to do list 😊

#Output 

Reading xlsx file 'team_tasks.xlsx'...
Read ok

My tasks by Priority:
ID30 [Important]: 
 - Title: Lorem ipsum
 - Due Date: 30 Oct
ID3 [High]: 
 - Title: Lorem ipsum
 - Due Date: 04 Nov
ID4 [High]: 
 - Title: Lorem ipsum
 - Due Date: 04 Nov
ID5 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 04 Nov
ID11 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 12 Nov
ID12 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 13 Nov
ID34 [Medium]: 
 - Title: Lorem ipsum
 - Due Date: 19 Nov

Hit Enter to exit...
Enter fullscreen mode Exit fullscreen mode

Follow me #techelopment

Official site: www.techelopment.it
Medium: @techelopment
Dev.to: Techelopment
facebook: Techelopment
instagram: @techelopment
X: techelopment
telegram: @techelopment_channel
youtube: @techelopment
whatsapp: Techelopment


References

URL

Top comments (0)