DEV Community

Kaarthik Sekar
Kaarthik Sekar

Posted on

Statistics in Python for Absolute Beginners 🐍📊

Introduction

Ever looked at a dataset and thought "what does this even mean?" — That's exactly where statistics comes in.

Statistics helps us summarize, understand, and draw conclusions from data. And Python makes it ridiculously easy to do all of that with just a few lines of code.

In this blog, we'll cover three core pillars:

  • 📦 Descriptive Statistics
  • 🎲 Probability Distributions
  • 🔬 Hypothesis Testing

No prior stats knowledge needed. Let's go!


🛠️ Setup

pip install numpy pandas scipy matplotlib seaborn
Enter fullscreen mode Exit fullscreen mode

📦 Part 1: Descriptive Statistics

Descriptive statistics are the first thing you do with any dataset — they describe and summarize your data.

Think of it like meeting someone new. You'd ask basic questions: How old are you? Where are you from? That's descriptive stats for data.

Key Concepts

Term What it means
Mean Average value
Median Middle value when sorted
Mode Most frequently occurring value
Std Dev How spread out the data is
Variance Spread squared

Code Example

import numpy as np
import pandas as pd

data = [23, 45, 67, 23, 89, 45, 23, 56, 78, 90]

print("Mean:", np.mean(data))        # 53.9
print("Median:", np.median(data))    # 50.5
print("Std Dev:", np.std(data))      # 23.6
print("Variance:", np.var(data))     # 557.29

# Mode using pandas
s = pd.Series(data)
print("Mode:", s.mode()[0])          # 23
Enter fullscreen mode Exit fullscreen mode

Visualizing It

import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(data, kde=True, color='steelblue')
plt.title("Data Distribution")
plt.xlabel("Values")
plt.show()
Enter fullscreen mode Exit fullscreen mode

💡 Tip: Always visualize your data before jumping to conclusions. A histogram tells you more than a single number ever will.


🎲 Part 2: Probability Distributions

A probability distribution tells you how likely different outcomes are.

Imagine rolling a dice — each number has an equal chance of appearing. That's a uniform distribution. But if you measure people's heights, most cluster around the average — that's a normal distribution.

Normal Distribution (The Bell Curve)

The most famous distribution in all of statistics. Most natural phenomena follow it — heights, exam scores, errors in measurements.

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

# Generate data from a normal distribution
# mean=0, std=1
x = np.linspace(-4, 4, 100)
y = norm.pdf(x, loc=0, scale=1)

plt.plot(x, y, color='tomato', linewidth=2)
plt.fill_between(x, y, alpha=0.2, color='tomato')
plt.title("Normal Distribution (Mean=0, Std=1)")
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.show()
Enter fullscreen mode Exit fullscreen mode

Binomial Distribution

Use this when you have yes/no outcomes — like flipping a coin 10 times and counting heads.

from scipy.stats import binom

n = 10      # number of trials
p = 0.5     # probability of success

# Probability of getting exactly 6 heads
print(binom.pmf(6, n, p))   # 0.205

# Plot
import matplotlib.pyplot as plt
x = range(0, 11)
y = [binom.pmf(k, n, p) for k in x]

plt.bar(x, y, color='mediumseagreen')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Heads")
plt.ylabel("Probability")
plt.show()
Enter fullscreen mode Exit fullscreen mode

💡 Rule of thumb: If your outcome is continuous (height, weight, price) → think Normal. If it's binary (yes/no, pass/fail) → think Binomial.


🔬 Part 3: Hypothesis Testing

This is where statistics gets powerful.

Hypothesis testing helps you answer questions like:

  • Is this new drug actually effective?
  • Did my website redesign improve conversions?
  • Are these two groups actually different?

The Core Idea

You start with two hypotheses:

  • H₀ (Null Hypothesis): Nothing is happening. No difference. Status quo.
  • H₁ (Alternative Hypothesis): Something IS happening. There IS a difference.

Then you calculate a p-value. If p < 0.05, you reject H₀ and say the result is statistically significant.

One-Sample T-Test

"Is the average height of my sample different from the national average?"

from scipy import stats

# Sample data (heights in cm)
sample = [165, 170, 168, 172, 160, 175, 163, 169, 171, 167]

# Test against national average of 170cm
t_stat, p_value = stats.ttest_1samp(sample, popmean=170)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("✅ Reject H₀ — significant difference found!")
else:
    print("❌ Fail to reject H₀ — no significant difference.")
Enter fullscreen mode Exit fullscreen mode

Two-Sample T-Test

"Are Group A and Group B actually different?"

from scipy import stats

group_a = [78, 82, 85, 90, 88, 76, 95, 84]
group_b = [70, 74, 68, 72, 80, 65, 77, 71]

t_stat, p_value = stats.ttest_ind(group_a, group_b)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("✅ The two groups are significantly different!")
else:
    print("❌ No significant difference between groups.")
Enter fullscreen mode Exit fullscreen mode

💡 Remember: A small p-value doesn't mean your result is important — it just means it's unlikely to be due to random chance. Always pair stats with context!


🧠 Quick Recap

Concept What it does Python library
Descriptive Stats Summarizes data numpy, pandas
Normal Distribution Models continuous data scipy.stats
Binomial Distribution Models yes/no outcomes scipy.stats
T-Test Compares means scipy.stats

🚀 What's Next?

Now that you've got the basics down, here's where to go next:

  • 📈 Regression Analysis — predict one variable from another
  • 🤖 Intro to Machine Learning — use stats to build models
  • 📊 Exploratory Data Analysis (EDA) — combine all of the above on real datasets

🙌 Final Thoughts

Statistics isn't about memorizing formulas — it's about asking the right questions about your data. Python gives you the tools; curiosity gives you the direction.

If this helped you, drop a like and follow for more beginner-friendly data science content! 🔔


Tags: #Python #Statistics #DataScience #BeginnerFriendly #MachineLearning #Pandas #NumPy #SciPy

Top comments (0)