Introduction
I’ll walk you through what a random variable really is, clarify the difference between discrete and continuous types, and show you how you can simulate both using Python code. By the end, you’ll have a practical understanding of how this foundational probability concept fits into real-world tasks like analytics and machine learning.
Understanding Random Variables in Python
Ever wondered how data scientists model uncertainty — like how many cars reach a traffic light per minute or how much rain falls in a day?
That’s where random variables come in.
A random variable is simply a function that assigns a number to every possible outcome of a random process. For example:
- The number rolled on a die → 1 to 6
- The height of a student → any real number (e.g., 165.4 cm)
This concept lets us apply probability and statistics to model real-world randomness — a key skill in machine learning and analytics.
There are two types:
Discrete random variables: Can take only specific, distinct values (e.g., number of heads in coin flips, number rolled on a die).
Continuous random variables: Can take any value within a range (e.g., height of people, time between arrivals at a bus stop).
Real-World Python Examples: Discrete and Continuous Random Variables
1: Discrete Random Variable – Number of Cars at a Traffic Light
Scenario:
You are a data scientist for a city’s transportation department. You want to model the number of cars that arrive at a red traffic light during one minute, a classic example for the Poisson distribution (commonly used for count data like this).
import numpy as np
# Assume the average cars arriving per minute is 4
lam = 4 # lambda parameter for Poisson
num_samples = 1000 # simulate 1000 minutes
# Simulate random variable: number of cars/minute
cars_per_min = np.random.poisson(lam=lam, size=num_samples)
print(f'Sample of cars arriving in one minute: {cars_per_min[:10]}')
print(f'Average cars per minute (simulated): {np.mean(cars_per_min):.2f}')
Explanation:
- np.random.poisson(lam=lam, size=num_samples) simulates the number of cars arriving at the light in each minute, according to the Poisson distribution.
- Each value is discrete (whole cars: 2, 5, etc., never 2.5 cars).
- This helps city planners study traffic patterns and optimize light timing.
2. Continuous Random Variable — Daily Rainfall Amount
Scenario:
As an environmental analyst, you want to simulate the amount of rainfall (in mm) on a given day. Rainfall amount is
continuous—it can be any real number in a range (e.g., 0 to 70 mm). The exponential distribution often models this when focusing on time between rain events or rainfall amounts with many dry days and few heavy rainfall days
import numpy as np
# Assume the average daily rainfall is 3mm (for rainy days)
avg_rainfall = 3 # mean rainfall in mm
num_samples = 1000 # simulate 1000 days
# Simulate random variable: daily rainfall in mm
rain_per_day = np.random.exponential(scale=avg_rainfall, size=num_samples)
print(f'Sample rainfall amounts (mm): {rain_per_day[:10]}')
print(f'Average rainfall (simulated): {np.mean(rain_per_day):.2f} mm')
Explanation:
- np.random.exponential(scale=avg_rainfall, size=num_samples) simulates how much rain falls in one day, using the exponential distribution.
- Each value is continuous (can be 2.3 mm, 0.005 mm, etc.).
- This lets analysts model water resources and flood risk.
Why It Matters
- Helps identify the right probability model (Poisson, Normal, Exponential, etc.)
- Essential for data simulation, feature engineering, and machine learning
- Forms the foundation for understanding uncertainty in real-world systems
Quick Recap
✅ Random Variable → maps outcomes to numbers
✅ Discrete → countable results (cars, dice)
✅ Continuous → measurable results (height, rainfall)
✅ Python → use numpy.random for easy simulation
References
If you enjoyed this blog, follow me for more Python and data science tips!
Top comments (0)