DEV Community

Anderson Braz
Anderson Braz

Posted on • Originally published at andersonbraz.com on

Data Science in Python: Pandas Introduction

In this post I show basic knowledge and notes for data science beginners. You will find in this post an link to jupyter file with code and execution.

Pandas Basics

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Use the following import convention:

import pandas as pd

Enter fullscreen mode Exit fullscreen mode

Pandas Data Structure

Series

A one-dimensional labeled array capable on hold any data type

s = pd.Series([23, 55, -7, 2], index=['a', 'b', 'c', 'd'])
s

Output:
a 23
b 55
c -7
d 2
dtype: int64

Enter fullscreen mode Exit fullscreen mode

DataFrame

A two-dimensional labeled data structure with columns of potentially different types

data = {'Country' : ['China', 'India', 'United States', 'Indonesia', 'Pakistan', 'Brazil', 'Nigeria', 'Bangladesh', 'Russia', 'Mexico'],
'Population':[1406371640, 1372574449, 331058112, 270203917, 225200000, 212656200, 211401000, 170054094, 146748590, 126014024] }
df = pd.DataFrame(data, columns=['Country', 'Population'])
df

Output:
Country Population
0 China 1406371640
1 India 1372574449
2 United States 331058112
3 Indonesia 270203917
4 Pakistan 225200000
5 Brazil 212656200
6 Nigeria 211401000
7 Bangladesh 170054094
8 Russia 146748590
9 Mexico 126014024

Enter fullscreen mode Exit fullscreen mode

Selection

Also see NumPy Arrays

Getting

s['b']

Output: 5

Enter fullscreen mode Exit fullscreen mode

AND

df[6:]

Output:
Country Population
6 Nigeria 211401000
7 Bangladesh 170054094
8 Russia 146748590
9 Mexico 126014024

Enter fullscreen mode Exit fullscreen mode

Selecting, Boolean, Indexing & Selecting

By Position

df.iloc[3, 0]

Output: 'Indonesia'

Enter fullscreen mode Exit fullscreen mode

By Label

df.loc[[6], 'Country']

Output:
6 Nigeria
Name: Country, dtype: object

Enter fullscreen mode Exit fullscreen mode

Boolean Indexing

result = df[df['Population'] > 270203917]
result

Output:
Country Population
0 China 1406371640
1 India 1372574449
2 United States 331058112

Enter fullscreen mode Exit fullscreen mode

Setting

s['a'] = 777
s['d'] = 999
s

Output:
a 777
b 5
c -7
d 999
dtype: int64

Enter fullscreen mode Exit fullscreen mode

Conclusion

Pandas is flexible and easy to use analysis and manipulation data.

See on Practice - Code and Execution

Credits

Photo by fabio on Unsplash

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay