In the last article, you learnt how to use the python csv module in order to work with csv files, you learnt the conversion of csv files into dictionaries and lists.
In this article, we are going to treat a software module called Pandas. It mainly works with structured data I.e csv data, tabular data, etc. It's no doubt that most Python programmers make use of this library when they come across csv files and this is due to it flexibility and usability.
How does Pandas work
.
The library arranges csv files into a tabular form making the first rows of each columns the head of the table. These heads could be used as an attribute to access the items in its columns.
What you'll learn
- Installing Pandas
- Data frame and series.
- Using Pandas to convert a csv into list.
- Pandas to convert csv into dict
- Calculating mean, median mode and other statistical functions.
Installing Pandas
Visit the pypi website and search for the pandas library. Click no its latest version and install. It will be configured in you python editor automatically.
Using Pandas to read csv files
Pandas is a library made specially for structural and tabulated datas so it does work on a csv file.
To do this:
- import the pandas in your main.py
- call the read_csv() function. The read_csv() takes in the file path of the csv file in a strung type. Data = pandas.read_csv("my_cs_file.csv")
DATAFRAME AND SERIES
.
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.
How to create a dataframe.
A dataframe accepts list nested in a dictionary. You will need to create the nested dictionary before you could create a dataframe
Import pandas as pd
Dict = {"Names" : ["Scott", "Abey", "Boluwatife ", "Thomas ", "Mike",],
"Scores" : [67, 82, 98, 46, 89]
}
df =pd.DataFrame(Dict)
You can print the df to see how it looks, exactly like a table.
N:B a list is nested in the dictionary. The keys in the dictionary will be the heads of the tables and each value which is a list will be the items in heads
Series:
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
s = pd.Series(data, index=index)
How to create a Series
Dict = {"Names": "Fola",
"Score":12}
df_series = pd.Series(Dict)
Get columns of dataframe
Columns can be accessed by columns heads. Taking the dict as an example. Of you want to access the name of students in the table.
Names = df.Names
Or
Names = df["Names"]
Get rows in a Dataframe
The best way to access a row in pandas dataframe is to check through the column of a table and locate the name of the item at the row we need.
row1= df[df.Names == "Scott"]
This will print every item in the same row with the name Scott
USING PANDAS TO CONVERT CSV INTO LIST.
Unlike the python pre built csv module, the pandas library already has a method for creating a list from a csv file.
-----to_list()---
This method takes a file path as a string input. The csv path we need to convert.
Example:
Import pandas as pd
new_list = pd.to_list("file.csv")
With these lines of codes, the output will be the list of items in your csv file. Unlike the csv module where you run the for loop for number of times to get what pandas has in just two(2) lines.
USING PANDAS TO CONVERT CSV INTO DICTIONARY.
-------to_dict()--------
The method also takes the dataframe as an input and convert its item into dictionary using the head words as the key and other rows as its value.
USING PANDAS FOR STATISTICS
The pandas library has also made available method for executing statistical tasks.
- Mean : .mean() The method takes a list of numbers and return the mean value of it list
Mode : .mode()
It takes a list of numbers and returns the number with the highest frequencyMedian: .median()
It takes a list of numbers, arranges them in an ascending order and returns the number at the intersection (middle)Product of numbers: .prod()
Accepts a an array of numbers and returns its product.
Cumulative numbers. .cumin()
It takes an array of numbers and returns the cumulative value.
Other available methods are:
.max(), .min(), .quantile, .cummax(), .var(), etc.
To know more about the pandas library visit the pandas library documentation
To have more insight about pandas library;
Visit the pandas documentation library
https://pandas.pydata.org/docs/user_guide/index.html
----------------If this was helpful, kindly react and comment----------------
Top comments (0)