DEV Community

Maria Boldyreva
Maria Boldyreva

Posted on

Class notes: Getting started with Pandas DataFrames

Here I'll put down the basics of working with Pandas DataFrames.

DataFrame is the primary Pandas data structure, which allows us to easily work with data tables.

A data frame can be constructed from a dict:

import pandas as pd
frame = pd.DataFrame({'numbers': range(3), 'chars': ['a'] * 3})
Enter fullscreen mode Exit fullscreen mode

it gives us the following output:

chars numbers
0 a 0
1 a 1
2 a 2

Also, DataFrame can be initialized from a .csv file:

frame = pd.read_csv('file.csv', headers=0, sep='\t')
Enter fullscreen mode Exit fullscreen mode

The first argument is a file name, the second is indexes of rows containing headers (int or a list of ints), the third is a data separator that will be used, a tab here ('\s' for a single whitespace, '\s+' for multiple whitespaces).

Columns headers can be extracted using the following:

frame.columnsm
# Index([u'chars', u'numbers], dtype='object')
Enter fullscreen mode Exit fullscreen mode

Another useful command returns the frame size:

frame.shape
# (3, 2)
Enter fullscreen mode Exit fullscreen mode

Let's add a row:

new_line = {'chars': 'b', 'numbers': 8}
frame.append(new_line, ignore_index=True, inplace=True)
frame
Enter fullscreen mode Exit fullscreen mode

it gives us the following output:

chars numbers
0 a 0
1 a 1
2 a 2
3 b 8

The inplace keyword shows that the result should be written into the original variable, not just return the resulting frame to the output.

Adding a columns is easier:

frame['bools'] = [False] * 3 + [True]
frame
Enter fullscreen mode Exit fullscreen mode
chars numbers bools
0 a 0 False
1 a 1 False
2 a 2 False
3 b 8 True

Rows and columns can be dropped:

# the first argument is a list if indexes, 
# the second is the axis (0 is for rows, 1 is for columns)
frame.drop([0,1], axis=0, inplace=True)
Enter fullscreen mode Exit fullscreen mode
chars numbers bools
0 a 2 False
1 b 8 True

The result can be saved into a .csv file:

frame.to_csv('updated.csv', set=',', header=True, index=None)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)