Pandas is a python library that is used to analyse data. It is a table themed library like spreadsheet in excel unlike numpy which had a matrixlike theme. It allows us to analyse, manipulate and explore huge amount of data.
For the basics, we will discuss a few topics-
- Series
- DataFrame
- Missing data
- Groupby
- Merging,joining & concatinating
To start we need to-
import numpy as np
import pandas as pd
Series:
syn=pd.Series(data,index)
Here data and index can be edited and fixed according to our need. It can be list, numpy array or even dictionary.
Here's some examples-(look up the variations)
If index not mentioned then by default it is added from 0
In dictionary the keys are the index and values are data
Series is just an idea but we won't see it most often. Its like a string in a list. It won't show a table rather a tablelike presentation. Now what we will use is dataframe which gives us our expected output.
DataFrame:
syn=pd.DataFrame(data,index,columns)
It is the fundamental topic. So we need to know about some of the usage and applications-
- Selection and indexing
- Conditional selection
- Creating new column
- Removing column-row
Selection and indexing:
Selecting a row-column:
Syntax:
Column selection: arr[column]- returns a series
Row selection: arr.loc[row]- returns a series
Row selection: arr.iloc[row number(starts from 0)]
Selecting range:
By rows-
By columns-
Selecting data:
By combining previous methods we can get a data from the dataframe
Conditional Selection:
Here we apply condition. If we just apply condition, it gives us boolean result. If we call the dataframe then it gives us values true to the condition and NaN in the false ones.
We can even combine conditions by 'and'/'or'. But here in pandas to combine conditions we use '&'/'|' instead of 'and'/'or'.
Creating columns:
syn: arr[column name]=data of the column
Removing row-column:
axis=0 -> Row
axis=1 -> Column
inplace=True
is used to make the change permanent
by default axis=0
Missing value:
Adding missing value:
use np.nan in data
Removing NaN:
By default .dropna()
removes row with NaN
For column use .dropna(axis=1)
We can also spare row or columns with certain number of true values
Filling missing values:
Groupby:
We can group common data in a column and work with them
After .groupby()
all common data gets stored...it doesn't print other then when we work with them. Like-
try .min()
, .max()
, .describe()
, .mean()
etc.
Concatenating, Merging, Joining:
Concatenating:
to attach column-wise or row-wise: pd.concat([],axis= )
by default axis=0
Merging:
to attach regarding common column
Joining:
to attach regarding common index
Summary:
This was the basics of pandas. It is really the fundamental stuff. There are features related to file handling, data analysis, plotting etc.
Let's keep exploring...let's dive together😉
Top comments (0)