Mohamed M El-Kalioby

Posted on Apr 16, 2023

Work on CSV files with Leopards

#welcome #community #beginners

Filtering a CSV file is one of the first steps when trying to interpolate data for data science. This article will show how to use Leopards to filter a CSV file.

We will use this csv file to do our filtrations.

Install leopards

pip install leopards

Open csv file

We will use the build-in Python module csv to load CSV

import csv
rows = csv.DictReader(open('people-100.csv'), delimiter=",")

Example: Count How many of each gender is in the set

Let's see how many males/females are in the dataset

from  leopards import Count
res = Count(rows,['Sex'])
print(res)

output:

dict_values([{'Sex': 'Male', 'count': 47}, {'Sex': 'Female', 'count': 53}])

Lets run the same code again.

res = Count(rows,['Sex'])
print(res)

output:

dict_values([])

Explanation

Why did it return result in the first run but not the second run?
DictReader is an iterator so with the first run, we were on the first item, so leopards was able to iterate over it but with the second run, the iterator was already consumed so it is like an empty list. so to solve the issue either load the rows in memory like below rows=[r for r in rows] or recreate the iterator by rows = csv.DictReader(open('people-100.csv'), delimiter=",")

Example: Find people whose email is '@example.com'

from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
res = Q(rows,Email__icontains='example.com')
for item in res:
    print(item)

output:

{'Index': '2', 'User Id': 'f90cD3E76f1A9b9', 'First Name': 'Phillip', 'Last Name': 'Summers', 'Sex': 'Female', 'Email': 'bethany14@example.com', 'Phone': '214.112.6044x4913', 'Date of birth': '1910-03-24', 'Job Title': 'Phytotherapist'}
{'Index': '3', 'User Id': 'DbeAb8CcdfeFC2c', 'First Name': 'Kristine', 'Last Name': 'Travis', 'Sex': 'Male', 'Email': 'bthompson@example.com', 'Phone': '277.609.7938', 'Date of birth': '1992-07-02', 'Job Title': 'Homeopath'}
{'Index': '4', 'User Id': 'A31Bee3c201ef58', 'First Name': 'Yesenia', 'Last Name': 'Martinez', 'Sex': 'Male', 'Email': 'kaitlinkaiser@example.com', 'Phone': '584.094.6111', 'Date of birth': '2017-08-03', 'Job Title': 'Market researcher'}

23 more records

Example: from the people with Email '@example.com' Return how many are 'Male'

from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
res = Q(rows,Email__icontains='example.com',Sex='Male')
print(len(list(res)))

output:

The above code is eqivalent to this

from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs={"Email__icontains":'example.com' ,"Sex":'Male'}
res = Q(rows,**kwargs)
print(len(list(res)))

Example: from the people with email '@example.com' Return how many are 'Male' and not born in 1992.

from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs={"Email__icontains":'example.com' ,"Sex":'Male', "Date of birth__nstartswith":"1992"}
res = Q(rows,**kwargs)
print(len(list(res)))

output:

Note: Index '3' is removed.

This can be written as follows using NOT:

from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs={"Email__icontains":'example.com' ,"Sex":'Male', "NOT": {"Date of birth__startswith":"1992"}}
res = Q(rows,**kwargs)
print(len(list(res)))

Example: Find the Males who emails are '@example.com' OR '@example.net'

from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs = kwargs = {"Sex": "Male", "OR":[{"Email__icontains":'example.com'},{"Email__icontains":"example.net"}]}
res = Q(rows,**kwargs)
print(list(res)[:3])

output:

[{'Index': '1', 'User Id': '88F7B33d2bcf9f5', 'First Name': 'Shelby', 'Last Name': 'Terrell', 'Sex': 'Male', 'Email': 'elijah57@example.net', 'Phone': '001-084-906-7849x73518', 'Date of birth': '1945-10-26', 'Job Title': 'Games developer'}, 
{'Index': '3', 'User Id': 'DbeAb8CcdfeFC2c', 'First Name': 'Kristine', 'Last Name': 'Travis', 'Sex': 'Male', 'Email': 'bthompson@example.com', 'Phone': '277.609.7938', 'Date of birth': '1992-07-02', 'Job Title': 'Homeopath'},
{'Index': '4', 'User Id': 'A31Bee3c201ef58', 'First Name': 'Yesenia', 'Last Name': 'Martinez', 'Sex': 'Male', 'Email': 'kaitlinkaiser@example.com', 'Phone': '584.094.6111', 'Date of birth': '2017-08-03', 'Job Title': 'Market researcher'}]

Conculsion

This tutorial showed how easy it is to filter CSV and do some aggregations quickly using Leopards.

If you have any question, you can ask me.

DEV Community

Work on CSV files with Leopards

Install leopards

Open csv file

Example: Count How many of each gender is in the set

Explanation

Example: Find people whose email is '@example.com'

Example: from the people with Email '@example.com' Return how many are 'Male'

Example: from the people with email '@example.com' Return how many are 'Male' and not born in 1992.

Example: Find the Males who emails are '@example.com' OR '@example.net'

Conculsion

Top comments (0)