MdMusfikurRahmanSifar

Posted on Jan 9, 2023

Pandas-Basics In Short

#gratitude

Pandas is a python library that is used to analyse data. It is a table themed library like spreadsheet in excel unlike numpy which had a matrixlike theme. It allows us to analyse, manipulate and explore huge amount of data.

For the basics, we will discuss a few topics-

Series
DataFrame
Missing data
Groupby
Merging,joining & concatinating

To start we need to-

import numpy as np
import pandas as pd

Series:

syn=pd.Series(data,index)

Here data and index can be edited and fixed according to our need. It can be list, numpy array or even dictionary.

Here's some examples-(look up the variations)

If index not mentioned then by default it is added from 0

In dictionary the keys are the index and values are data

Series is just an idea but we won't see it most often. Its like a string in a list. It won't show a table rather a tablelike presentation. Now what we will use is dataframe which gives us our expected output.

DataFrame:

syn=pd.DataFrame(data,index,columns)
It is the fundamental topic. So we need to know about some of the usage and applications-

Selection and indexing
Conditional selection
Creating new column
Removing column-row

Selection and indexing:

Selecting a row-column:

Syntax:
Column selection: arr[column]- returns a series
Row selection: arr.loc[row]- returns a series
Row selection: arr.iloc[row number(starts from 0)]

Selecting range:

By rows-

By columns-

Selecting data:

By combining previous methods we can get a data from the dataframe

Conditional Selection:

Here we apply condition. If we just apply condition, it gives us boolean result. If we call the dataframe then it gives us values true to the condition and NaN in the false ones.

We can even combine conditions by 'and'/'or'. But here in pandas to combine conditions we use '&'/'|' instead of 'and'/'or'.

Creating columns:

syn: arr[column name]=data of the column

Removing row-column:

axis=0 -> Row
axis=1 -> Column
inplace=True is used to make the change permanent

by default axis=0

Missing value:

Adding missing value:

use np.nan in data

Removing NaN:

By default .dropna() removes row with NaN
For column use .dropna(axis=1)

We can also spare row or columns with certain number of true values

Filling missing values:

Groupby:

We can group common data in a column and work with them

After .groupby() all common data gets stored...it doesn't print other then when we work with them. Like-

try .min(), .max(), .describe(), .mean() etc.

Concatenating, Merging, Joining:

Concatenating:

to attach column-wise or row-wise: pd.concat([],axis= ) by default axis=0

Merging:

to attach regarding common column

Joining:

to attach regarding common index

Summary:

This was the basics of pandas. It is really the fundamental stuff. There are features related to file handling, data analysis, plotting etc.
Let's keep exploring...let's dive together😉

DEV Community