DEV Community

sc0v0ne
sc0v0ne

Posted on

4 3 3 3 3

Statistics with R - Measures of Central Tendency and Measures of Dispersion

mtcars

data(mtcars)
head(mtcars)
Enter fullscreen mode Exit fullscreen mode

Loads and displays the first few rows of the mtcars dataset.

str(mtcars)
Enter fullscreen mode Exit fullscreen mode

Displays the structure of the mtcars dataset, showing the type of each column.

summary(mtcars)
Enter fullscreen mode Exit fullscreen mode

Measures of Central Tendency


Mean

μ=1Ni=1Nxi \mu = \frac{1}{N} \sum_{i=1}^{N} x_i

Calculates the mean of a sequence of numbers.

n = c(1,2,4,5,6)

print(n)

mean_ = sum(n) / length(n)

print(mean_)
Enter fullscreen mode Exit fullscreen mode
mean_cyl = sum(mtcars$cyl) / length(mtcars$cyl) 

print(mean_cyl)
Enter fullscreen mode Exit fullscreen mode

Median

  • If ( N ) is odd:
Med=x(N+12) \text{Med} = x_{\left(\frac{N+1}{2}\right)}
  • If ( N ) is even:
Med=x(N2)+x(N2+1)2 \text{Med} = \frac{x_{\left(\frac{N}{2}\right)} + x_{\left(\frac{N}{2} + 1\right)}}{2}

Calculates the median of a sequence of numbers with an odd size.

data_even <- c(7, 13, 19, 33, 67)

median_ <- median(data_even)
print(median_)

data_even <- c(7, 13, 19, 33, 67)
n = length(data_even)
median_ <- data_even[(n + 1) / 2]
print(median_)
Enter fullscreen mode Exit fullscreen mode

Calculates the median of a sequence of numbers with an even size.

data_odd <- c(2, 34, 76, 92, 112)

median_ <- median(data_odd)
print(median_)

data_odd <- c(2, 34, 76, 92, 112)
n = length(data_odd)

median_ <- (data_odd[n / 2] + data_odd[n / 2 + 1]) / 2

print(median_)
Enter fullscreen mode Exit fullscreen mode
median(mtcars$cyl)
Enter fullscreen mode Exit fullscreen mode
median(mtcars$qsec)
Enter fullscreen mode Exit fullscreen mode

Mode

Mode=argmaxxi f(xi) \text{Mode} = \underset{x_i}{\operatorname{argmax}} \ f(x_i)

Creates a frequency table for a sequence of numbers.

numbers <- c(1, 233, 233, 010101, 342, 1, 2, 1111, 1, 55)

tnumbers <- table(numbers)
print(numbers)
print(tnumbers)
Enter fullscreen mode Exit fullscreen mode
mode_ <- as.numeric(names(tnumbers)[tnumbers == max(tnumbers)])
print(mode_)
Enter fullscreen mode Exit fullscreen mode

Identifies the most frequent value(s) in the sequence of numbers.

library(DescTools)
Enter fullscreen mode Exit fullscreen mode
mode_ <- Mode(tnumbers)
print(mode_)
Enter fullscreen mode Exit fullscreen mode

Measures of Dispersion

Defines a sequence of numbers.

n_arr = c(1,2,4,5,6)
print(n_arr)
Enter fullscreen mode Exit fullscreen mode

Variance

σ2=1Ni=1N(xiμ)2 \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2

Calculates the variance of a sequence of numbers.

mean_ <- mean(n_arr)

print('Mean')
print(mean_)

print('Variance')
var_ <- sum((n_arr - mean_)^2) / length(n_arr)

print((n_arr - mean_))
print((n_arr - mean_)^2)
print(sum((n_arr - mean_)^2))
print(length(n_arr))
print(var_)
Enter fullscreen mode Exit fullscreen mode

Standard Deviation

σ=σ2 \sigma = \sqrt{\sigma^2}

Calculates the standard deviation, which is the square root of the variance.

print('Variance')
var_ <- sum((n_arr - mean_)^2) / length(n_arr)
print((n_arr - mean_))
print((n_arr - mean_)^2)
print(sum((n_arr - mean_)^2))
print(length(n_arr))
print(var_)

print('Standard Deviation')
std_ <- sqrt(var_)
print(std_)
Enter fullscreen mode Exit fullscreen mode

Calculates the standard deviation using the sd function in R.

std_ <- sd(n_arr)
print(std_)
Enter fullscreen mode Exit fullscreen mode

Range

Range=xmaxxmin \text{Range} = x_{\text{max}} - x_{\text{min}}

Calculates the range, which is the difference between the maximum and minimum values.

range_ <- max(n_arr) - min(n_arr)
print('Range')
print(max(n_arr))
print(min(n_arr))
print(range_)
Enter fullscreen mode Exit fullscreen mode

Calculates the range using the diff function.

range_ <- diff(range(n_arr))
print(range_)
Enter fullscreen mode Exit fullscreen mode

Coefficient of Variation

CV=σμ \text{CV} = \frac{\sigma}{\mu}

Calculates the coefficient of variation, which is the ratio of the standard deviation to the mean.

mean_ <- mean(n_arr)
print('Mean')
print(mean_)

print('Variance')
var_ <- sum((n_arr - mean_)^2) / length(n_arr)
print((n_arr - mean_))
print((n_arr - mean_)^2)
print(sum((n_arr - mean_)^2))
print(length(n_arr))
print(var_)

print('Standard Deviation')
std_ <- sqrt(var_)
print(std_)

print('Coefficient of Variation')
cv <- std_ / mean_
print(cv)
Enter fullscreen mode Exit fullscreen mode

My Latest Posts


Favorites Projects Open Source


About the author:

A little more about me...

Graduated in Bachelor of Information Systems, in college I had contact with different technologies. Along the way, I took the Artificial Intelligence course, where I had my first contact with machine learning and Python. From this it became my passion to learn about this area. Today I work with machine learning and deep learning developing communication software. Along the way, I created a blog where I create some posts about subjects that I am studying and share them to help other users.

I'm currently learning TensorFlow and Computer Vision

Curiosity: I love coffee

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay