Long Data Vs. Wide Data

#todayilearned

So, lately I have had my hands on some raw unclean data for an assignment for school. Originally I thought that messy data was about cleaning up blank values, formatting text, numbers, and strings in the right form, etc. But as I proceed to analyze my data in R I found out that it could not be handled. There was a key concept that I was missing when it comes to setting up data the right way: Wide and Long Data

What is Wide Data?

In the wide data (also known as unstacked) is when each variable attribute for a subject is in a separate column.

Person	Age	Weight
Buttercup	24	110
Bubbles	24	105
Blossom	24	107

What is Long Data?

Narrow (stacked) data is presented with one column containing all the values and another column listing the context of the value

Person	Variable	Value
Buttercup	Age	24
Buttercup	Weight	110
Bubbles	Age	24
Bubbles	Weight	105
Blossom	Age	24
Blossom	Weight	107

It is easier for r to do analysis in the Long data form. This concept might seem weird at first. We are use to seeing and analyzing data in Wide data form but with practice it gets easier over time. R has an awesome package called reshape2 to convert your data from wide to long.

First install the r package and load the library.

install.packages("reshape2") library(reshape2)

Using the wide table above we will split our variables into two groups identifiers and measured variables.

Identifier variable:Person
Measured variable: Age, weight

In order to transform this wide data into long data we will have to use the melt method. You “melt” data so that each row is a unique id-variable combination.

df
 Person Age Weight
1 Buttercup 24 110
2 Bubbles 24 105
3 Blossom 24 107

ppg <-melt(df,id=c("Person"),measured=c("Age","Weight"))
 ppg
 Person variable value
1 Buttercup Age 24
2 Bubbles Age 24
3 Blossom Age 24
4 Buttercup Weight 110
5 Bubbles Weight 105
6 Blossom Weight 107

Resources

For official documentation about the reshape library from its creator Hadley Wickham.

More about Wide vs. Long data check out The Analysis Factor

More information about cleaning and shaping data from messy data to tidy data check out Hadley Wickham’s paper Tidy Data

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

DEV Community

Long Data Vs. Wide Data

Resources

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Top comments (0)

Read next

Arbitrum's Innovative Open Source Licensing Approach

Blockchain and Academic Credentials: A New Era in Education

Arbitrum vs. Polygon: A Deep Dive into Ethereum's Layer 2 Scaling Solutions

Methods to Find the IP Address from a Hostname