Thales Bruno

Posted on Jul 8, 2020 • Edited on Jul 20, 2020 • Originally published at thalesbr.uno

Numerical variables

#statistics #python #datascience #beginners

Numerical Variables

Numerical variables, also known as quantitative variables, are the type of data that represent something measurable or countable like frequency, measurement, etc. Another attribute of numerical variables is that they are always numbers that can be placed in a meaningful order with consistent intervals.

As examples of quantitative variables we may mention:

Weight
Height
Sales
Production units
Movie Ratings

Discrete and continuous

Numerical variables may be either discrete or continuous.

Discrete values are the result of counting, like when we count how many goals a football team has scored in a season. Here, the data take certain numerical values, like 60, 65, 72, and so on.

On the other hand, continuous values are the result of a measurement. For instance, we may measure the weights in kilograms of football team players, and the data will assume continuous values inside a range, like 84.1kg, 74.89483kg.

Buckets and bins

Buckets and bins are the way we may organize the numerical data collected in a meaningful order with consistent intervals to analyze and make insights from them. For example, we might collect the number of movies produced in the 20th Century and put them in buckets of 10 years, and as result, we could see the evolution of the Movie Industry in the last century.

But in this article, we will demonstrate a bit of numerical data using the Kaggle Google Play Store Apps dataset from Lavanya Gupta as we did in the article about Categorical Variables.

Using pandas, we will load the dataset, but only the Rating column, which is a typical numerical variable. The users rated the Apps from 1.0 to 5.0.

import pandas as pd
import plotly.express as px
from collections import Counter

df = pd.read_csv("./data/googleplaystore.csv", usecols=['Rating'])

# Drop missing values
df.dropna(axis=0, inplace=True)

ratings = df.Rating

# Drop a outline rating of 19.0 (from some error)
ratings.drop(10472, inplace=True)

# Plot a histogram
fig = px.histogram(ratings, x='Rating', title='Google Play Store Apps Ratings', template="simple_white")
fig.show()

Histogram

The chart we see above is a Histogram, which seems like the Bar Chart we've plotted in the Categorical Variable post, but actually they have some important differences. In a Histogram there is no space between the bars, and the intervals are equally spaced, as expected to numerical values.

The shape of the histogram already gives us useful information. The histogram above is left-skewed (it has a tail to the left), so we may conclude that most Apps were well evaluated because the highest rectangles are on the right side of the histogram, where we have the highest rates (between 4.0 and 5.0).

Other shapes a histogram can have are right skew, symmetric, bimodal, uniform. Perhaps we will see more examples of histogram shapes in the next posts!

References

courses.lumenlearning.com | 1.2 Data: Quantitative Data & Qualitative Data 🔎

online.stat.psu.edu | 1.1.1 - Categorical & Quantitative Variables 🔎

YouTube | Brandon Foltz | Statistics 101: Descriptive Statistics, Histograms
🔎

DEV Community