DEV Community

Cover image for Statistics In Data Science
Tanav
Tanav

Posted on

Statistics In Data Science

This will be a series of articles dealing with statistical concepts from Random variables to various distributions to likelihood estimation and hypothesis testing.
Prerequisite- Basic Probability.

Random Variables

The literal definition for Random Variable is “ A function with domain as the sample space of an experiment and range as real numbers”. This just means random variables are numerical versions of outcomes of any experiment.

Simplest possible example - Coin Toss :
Sample Space = {Heads , Tails}
So random variable X can be written as X(Heads)=1 , X(Tails)=0.
This can be any number we assign but usually meaningful functions are considered.

Throw a dice -
Sample Space = {1,2,3,4,5,6}
X is defined as X(1)= x₁, X(2)= x₂ ….. X(6)=x₆

What are the values of x₁, x₂, x₃ , x₄, x₅, x₆?

These xᵢ’s are essentially same as sample space and distinct for a one-to-one function. But the need not be distinct and can be same as well.

Example - In the same Die roll the condition is of Even vs Odd Numbers
Random Variables- E(2)=E(4)=E(6)=1 and E(1)=E(3)=E(5)=0

Random Variables and Events

If X is a random variable then
(X Similarly (X>x) , (X=x) , (X ≤ x) and (X≥x) are all events.

In the die example-
S={1,2,3,4,5,6}
X(1)= 1 , X(2) = 2 , X(3) = 3 , X(4) = 4 , X(5)= 5 and X(6) =6
For event (X<4) : { 1,2,3}
Event {2,5} can also be written as (X = 2) ⋂ (X=5)
Why use Random Variables?

Instead of trying to assign probabilities to the entire outcome we assign them to events defined through them.
This reduces the detail in outcome to something simpler
Using limited data only random variables can be studied
Types of Random Variables -

  • Discrete Random Variable
  • Continuous Random Variable

Discrete Random Variable

If the range of random variables is a discrete set it is called discrete random variable. Usually described by it’s Probability Mass Function (PMF).
PMF = Random Variable X set on Range T
Fₓ(t) : T -> [0,1] is defined as Fₓ(t)= P(X=t) for t ∈ T
X=t is the event
P(X=t) is probability of X taking the value of t.
For example-
A fair coin is tossed 3 times
Sample Space = {HHH , HHT , HTH , HTT , THH , THT , TTH , TTT}
X= Number of heads
Sample Space Table

X ∈ {0,1,2,3}
Fₓ(0) = 1/8
Fₓ(1)=3/8
Fₓ(2)=3/8
Fₓ(3)=1/8
This shows the PMFs of various discrete random variables. The main thing to remember about PMF is that **sum of all PMFs in the range is 1 **at all times.
0 ≤ Fₓ(t) ≤ 1
∑ Fₓ(t)=1 here (t ∈T)

Continuous Random Variable -

A random variable X with CDF Fₓ is said to be continuous random variable if Fₓ is continuous at every x. CDF has no jumps or steps .
CDF = Cumulative Distribution Function
This is a very detailed topic so I will be writing another article discussing this in detail.

Stay tuned.

Top comments (0)