Class notes: Getting started with Pandas DataFrames

mxl profile image Maria Boldyreva ・2 min read

Here I'll put down the basics of working with Pandas DataFrames.

DataFrame is the primary Pandas data structure, which allows us to easily work with data tables.

A data frame can be constructed from a dict:

import pandas as pd
frame = pd.DataFrame({'numbers': range(3), 'chars': ['a'] * 3})

it gives us the following output:

chars numbers
0 a 0
1 a 1
2 a 2

Also, DataFrame can be initialized from a .csv file:

frame = pd.read_csv('file.csv', headers=0, sep='\t')

The first argument is a file name, the second is indexes of rows containing headers (int or a list of ints), the third is a data separator that will be used, a tab here ('\s' for a single whitespace, '\s+' for multiple whitespaces).

Columns headers can be extracted using the following:

# Index([u'chars', u'numbers], dtype='object')

Another useful command returns the frame size:

# (3, 2)

Let's add a row:

new_line = {'chars': 'b', 'numbers': 8}
frame.append(new_line, ignore_index=True, inplace=True)

it gives us the following output:

chars numbers
0 a 0
1 a 1
2 a 2
3 b 8

The inplace keyword shows that the result should be written into the original variable, not just return the resulting frame to the output.

Adding a columns is easier:

frame['bools'] = [False] * 3 + [True]
chars numbers bools
0 a 0 False
1 a 1 False
2 a 2 False
3 b 8 True

Rows and columns can be dropped:

# the first argument is a list if indexes, 
# the second is the axis (0 is for rows, 1 is for columns)
frame.drop([0,1], axis=0, inplace=True)
chars numbers bools
0 a 2 False
1 b 8 True

The result can be saved into a .csv file:

frame.to_csv('updated.csv', set=',', header=True, index=None)


Editor guide