In this post I show basic knowledge and notes for data science beginners. You will find in this post an link to jupyter file with code and execution.
Pandas Basics
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Use the following import convention:
import pandas as pd
Pandas Data Structure
Series
A one-dimensional labeled array capable on hold any data type
s = pd.Series([23, 55, -7, 2], index=['a', 'b', 'c', 'd'])
s
Output:
a 23
b 55
c -7
d 2
dtype: int64
DataFrame
A two-dimensional labeled data structure with columns of potentially different types
data = {'Country' : ['China', 'India', 'United States', 'Indonesia', 'Pakistan', 'Brazil', 'Nigeria', 'Bangladesh', 'Russia', 'Mexico'],
'Population':[1406371640, 1372574449, 331058112, 270203917, 225200000, 212656200, 211401000, 170054094, 146748590, 126014024] }
df = pd.DataFrame(data, columns=['Country', 'Population'])
df
Output:
Country Population
0 China 1406371640
1 India 1372574449
2 United States 331058112
3 Indonesia 270203917
4 Pakistan 225200000
5 Brazil 212656200
6 Nigeria 211401000
7 Bangladesh 170054094
8 Russia 146748590
9 Mexico 126014024
Selection
Also see NumPy Arrays
Getting
s['b']
Output: 5
AND
df[6:]
Output:
Country Population
6 Nigeria 211401000
7 Bangladesh 170054094
8 Russia 146748590
9 Mexico 126014024
Selecting, Boolean, Indexing & Selecting
By Position
df.iloc[3, 0]
Output: 'Indonesia'
By Label
df.loc[[6], 'Country']
Output:
6 Nigeria
Name: Country, dtype: object
Boolean Indexing
result = df[df['Population'] > 270203917]
result
Output:
Country Population
0 China 1406371640
1 India 1372574449
2 United States 331058112
Setting
s['a'] = 777
s['d'] = 999
s
Output:
a 777
b 5
c -7
d 999
dtype: int64
Conclusion
Pandas is flexible and easy to use analysis and manipulation data.
Top comments (0)