Data visualization terminology

#dataviz #beginners

Dimension - parameter, characteristic. Axes, in the context of geometry, planes, plots etc. Column, in the context of RDBMS (relational database management system) and OLAP (online analytical processing). For example, age, gender, height, color, coordinates, time etc.

Cardinality - the number of elements in dimension (originally in a set). For example, gender is a dimension with low cardinality, coordinates is a dimension with high cardinality.

Continuous, discrete - dimension is continuous if it can be divided infinitely into smaller pieces, typically applicable only for platonic ideas. For example, space and time considered as continuous. In practice, all measurements taken by humans and information which we can store in computers is discrete because it is limited by the precision of measurement tools. So when we deal with continuous dimensions in computers we're actually dealing with the discrete approximation (typically with high cardinality).

Comparable, uncomparable - numerical dimensions are comparable because numbers can be compared and hence sorted. But there are characteristics, which are uncomparable - you can't say one item is bigger (brighter, heavier etc.) than other. For example, gender is an uncomparable dimension. You can sort genders alphabetically, but it has no sense and in a different language, sorting can change.

Categorization - when you deal with dimension with high cardinality (higher than you need) you can lower cardinality, by grouping items. For example, instead of building a graph for all values of age you can create two categories: more than or equal to 50 years, less than 50 years.

Cartesian product - (originally from set theory), all possible combinations of values from two dimensions. For example, the geometric plane consists of points which can be represented by 2 coordinate axes (hence 2-dimensional or Cartesian coordinate system). Or we have two dimensions with low cardinality we can use a Cartesian product, to create one dimension. For example, we have data about population gender (male, female, non-binary) and age, using categorization for age (>= 50, < 50) and Cartesian product we can achieve one dimension with 6 values.

Coordinate system - a combination of one or more dimension. For example, a Cartesian coordinate system (used for simple plots), a polar coordinate system, geographical coordinate system (latitude and longitude) etc.

Projection - is a way to represent one type of dimensional system in the different dimensional system (transformation). For example, there are ways to represent 3-dimensional space in 2-dimensional space, like
multiview projection, axonometric projection; there are ways to represent earth surface on the map, like Mercator and other. Projection can introduce some kind loss of information (like axonometric projection) or distortion (like Mercator).

Scale - is the reference of how big or small original coordinates in comparison to what given plot shows. For example, the linear scale used in maps or logarithmic scale used when quantity varies highly, but we interested in details of big and small quantities.

Outlier or anomaly - a person or thing that is atypical within a particular group, class, or category. This term applied to the data rather than visualization, but we can mark separately outliers on the plots.

Graph - a network of points connected by lines (mathematical term). In data visualization, we are talking about the visual representation of the mathematical term. The main idea of graphs is to show relationships between elements. In this sense, it doesn't differ from other visualizations which the main purpose is to show relationships between dimensions.

Photo by Michael Schiffer on Unsplash

DEV Community

Data visualization terminology

Top comments (0)