GharamElhendy

Posted on Apr 29, 2021

Analyzing Data Sets Using Pandas and Matplotlib

#python #datascience #pandas #analytics

1. Looking Through Certain Values in a Column

import pandas as pd
df = pd.read_csv('file_name.csv')
df.head()

For example, if we have a column with two values (X and Y), we can create a dataframe that contains only one of the two values:

df_x = df[df['column_title'] == 'X']
df_x.head()

Indexing original data frame with "mask" to return all the rows in which the value of "mask" is true.

I.e: The rows in which the row's value is 'X'

mask = df['column_title'] == 'X'
print(mask)

2. Getting summary statistics, which includes count, mean, standard deviation, minimum, max, 25%, 50%, and 75%:

df_x['column'].describe()

3. Visual Comparisons

import matplotlib.pyplot as plt
% matplotlib ((used in notebooks to display visualizations in the notebook)

Note: While these visual representations don't give us a definite answer regarding causality, we can notice some correlations that could give us some insight into the relationships between variables.

Top comments (0)

AdventJS: 25 Programming Challenges in JavaScript and Python! [Free]

Miguel Ángel Durán 👨‍💻 - Dec 13 '24

Advent of Code '24 - Day 13 Claw Contraption

Grant Riordan - Dec 13 '24

A Beginner’s Journey Through the Machine Learning Pipeline (1)

Foyzul Karim - Dec 12 '24

Top 5 Programming Languages to Watch in 2025: Which Ones Will Shape the Future?

Codic Labs - Dec 13 '24

Forem