1. Looking Through Certain Values in a Column
import pandas as pd
df = pd.read_csv('file_name.csv')
df.head()
For example, if we have a column with two values (X and Y), we can create a dataframe that contains only one of the two values:
df_x = df[df['column_title'] == 'X']
df_x.head()
Indexing original data frame with "mask" to return all the rows in which the value of "mask" is true.
I.e: The rows in which the row's value is 'X'
mask = df['column_title'] == 'X'
print(mask)
2. Getting summary statistics, which includes count, mean, standard deviation, minimum, max, 25%, 50%, and 75%:
df_x['column'].describe()
3. Visual Comparisons
import matplotlib.pyplot as plt
% matplotlib ((used in notebooks to display visualizations in the notebook)
Note: While these visual representations don't give us a definite answer regarding causality, we can notice some correlations that could give us some insight into the relationships between variables.
Top comments (0)