What / Why
I wanted to take a high-level look at when DEV comments are created relative to the thread they're posted on.
DEV members encounter articles from a variety of sources — their home feed, on-site notifications, social media, search engine queries, etc. I was curious about what percentage of comments were on "fresh" threads, vs. the growing long-tail of articles here on the site.
How
I don't have much experience with data analysis or even SQL, but it was pretty quick/easy to prepare this high-level report. To simplify matters, I focused only on comments created in August.
Step 1: Grab the created_at
date and article ID for all comments (5400 in total)
Step 2: Grab the published_at
date for all articles grabbed in Step 1
Step 3: Calculate the difference (in days) for Article Date
- Comment Date
.
Results
As you might expect, it's heavily front-loaded with a rapid decrease as the article gets older.
I decided to clean up the data, and create a "bucket" for 5-30, and 30+:
Highlights:
- 33% of comments are on articles/threads published that day
- 29% on an article that’s 1 day old
- 9% on an article that’s 2 days old
- 4% for 3 days old
- 3% for 4 days old
- 15% for 5-30 days old
- 7% for 30+ days old
Closing Thoughts
I think it's fairly interesting that 7% of comments are posted on articles that were published at least 30 days prior. The "long tail" of evergreen content serves as a great resource for the broader developer community, and it's great to know that the wonderful content contributed here continues to be enjoyed and discussed even beyond the initial burst of exposure.
Hope you found this interesting!
Top comments (4)
Yaaas! I've been dreaming of the day I can do data analysis on the Dev.to data :) Also, what's up with the weird bump of comments on day 16? Maybe that's the average amount of days it takes before a post is tweeted out by the Twitter account.
Neat!
Yeah that is funny. This isn’t the largest data set in the world, could be affected by one big outlier.
In the future I imagine we'll probably be able to open source anonymized data sets to go along with the code, so the whole community can play around like this.
This is a very good idea : )
Can't wait for that to happen.
I would really get into that project but I should probably first get closer to the existing API so to the said data.
@pkfrank I would love to know the technicals details you've been through in steps 1 & 2 grabbing data or any ressource you started with should be enough though. Awesome initiative BTW. It was fun to discover 👍