DEV Community

Cover image for What is the best time to post on a data-backed answer πŸ•°πŸ¦„πŸ€·β€β™‚οΈ

Posted on • Updated on

What is the best time to post on a data-backed answer πŸ•°πŸ¦„πŸ€·β€β™‚οΈ is a wonderful blogging platform that emerged a few years ago. I love writing for it and reading content published there. But what I like the most, and I think what everybody like the most is the community that was built on the platform.

A community is known to interact a lot with the poster through different kind of like and comments. There is no β€œkarma” on, but one way to measure the popularity, the score, of a post is by looking a the number of interactions this post had with the community.

The number of comments, and of course the number of likes, which on the platform are divided into 3 categories: Unicorn πŸ¦„, Like ❀ and bookmark πŸ“•.

I recently wondered if an article posted at a certain time of the day performed better than others. And, if yes, what was the optimal time to post a blog post in order to be read by as many people as possible. I have some intuition, but I wanted to have proof and facts to work with.

Here is what I did:

Gathering the data:

I will be short here as I'll write a longer post in the future to explain in detail how to efficiently gather this type of data.

I recently noticed, looking at the dom, that every article had a public id available.

I also knew that there is a public endpoint that allow you to fetch user information that look like this:

Enter fullscreen mode Exit fullscreen mode

So naturally I tried to do the same with article and ...

HTTP/1.1 200 OK
    "body_html": "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"\">\n<html><body>\n<p>The other day I was touching up a PR that had been approved and was about to merge and deploy it when, out of habit, I checked the clock. It was 3:45pm, which for me, was past my \"merge before\" time of 3:30pm. I decided to hold off and wait until the next morning. </p>\n\n<p>The whole process got me thinking. Does anyone else have their own personal merge or deploy policies? Is there a time before or after when you say, not today? Is there a day of the week you don't like to merge stuff. A lot of people joke about read-only Fridays, but I have to admit, I kinda follow that rule. Anything remotely high risk I wait until Monday to merge. </p>\n\n<p>What's your personal merge/deploy policy?</p>\n\n</body></html>\n",
    "canonical_url": "",
    "comments_count": 6,
    "cover_image": ",f_auto,fl_progressive,h_420,q_auto,w_1000/",
    "description": "What's your personal merge/deploy policy?",
    "id": 81371,
    "ltag_script": [],
    "ltag_style": [],
    "path": "/molly_struve/whats-your-personal-mergedeploy-policy-30mi",
    "positive_reactions_count": 13,
    "published_at": "2019-03-22T22:19:36.651Z",
    "readable_publish_date": "Mar 22",
    "slug": "whats-your-personal-mergedeploy-policy-30mi",
    "social_image": ",f_auto,fl_progressive,h_500,q_auto,w_1000/",
    "tag_list": "discuss",
    "title": "What's your personal merge/deploy policy?",
    "type_of": "article",
    "url": "",
    "user": {
        "github_username": "mstruve",
        "name": "Molly Struve",
        "profile_image": ",f_auto,fl_progressive,h_640,q_auto,w_640/",
        "profile_image_90": ",f_auto,fl_progressive,h_90,q_auto,w_90/",
        "twitter_username": "molly_struve",
        "username": "molly_struve",
        "website_url": ""
Enter fullscreen mode Exit fullscreen mode

and bingo !!

All I had to do now was: number 1, find if articles' id where sequential, and 2 if 1 was true, find the most recent article's id.
Both things were easy to check. I just had to open my browser inspector a couple of times on recent articles.

What I did next was calling this API 94k times using scrappy and storing the information in a clear .csv. More thing on this in a future post.

To do this I used ScrapingBee, a web scraping tool I recently launched 😎.

What do we have now ?

Out of 94k API calls, almost half of them returned a 404: resource not found. I guess it means that half of the articles created are never published but I am not sure about it. I still had ~40k data points, which was more than enough to prove my point.

Each row in my csv had multiples useful information, but for what I was looking for I only needed two things: the number or like and the date of publishing.
Hopefully, those two things were returned by the API, see positive_reaction_count and published_at in the previous snippet.

Enriching data

To work with the data I used pandas, a well know python library, that is even one of the most famous python package on GitHub.

I'll show here some code snippet, if you want a more thorough tutorial, please tell me in the comments.

Loading data from csv with pandas is very easy:

import pandas as pd
df = pd.read_csv('./output.csv')
Enter fullscreen mode Exit fullscreen mode

As I wanted to know the best time/day to post on, I need to transform the published_at column in 2 other columns: day_of_week ('Mon', 'Tue', ...) and hour.

Pandas allow to easily add, transform and manipulate data. All I need to do this was those few lines:

df['hour'] = pd.to_datetime(df['published_at']).dt.hour

days_arr = ["Mon","Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

def get_day_of_week(x):
    date = pd.to_datetime(x)
    return days_arr[date.weekday()]

df['day_of_week'] = df['published_at'].apply(get_day_of_week)

Enter fullscreen mode Exit fullscreen mode

All my data is now stored in a dataframe, the main data structure used my pandas, hence the name: df.

A little bit a data viz

I had now all the informations I needed.

Here what was in my dataframe:

day_of_week hour positive_reaction_count
Thu 0 4
Mon 1 34
... ... Β ...
Sun 22 41
Thu 17 9

Each line representing one post, I had around 38k lines.

What I naturally did next was summing positive_reaction_count by day and hour.

Here is how to do it in pandas:

aggregated_df = df.groupby(['day_of_week', 'hour'])['positive_reaction_count'].sum()

Enter fullscreen mode Exit fullscreen mode

And now my df looked like this:

day hour positive_reaction_count
Monday 0 4110
1 3423
2 2791
... ... Β ...
22 4839
23 3614
... ... Β ...
Sunday 0 110
1 423
2 731
... ... Β ...
22 4123
23 2791

Great, in order to have exactly the data in the format I need, a few more work is necessary.
Basically rotating columns around.

pivoted_df = aggregated_df.reset_index().pivot('hour', 'day_of_week', 'positive_reaction_count')

Enter fullscreen mode Exit fullscreen mode

Now my df has this look:

hour Mon Tue ... Sun
0 4110 5071 ... 5208
1 3423 4336 ... 3230
2 2791 3056 ... 1882
... ... ... ... ...
23 3614 4574 ... 3149

And now, finally, I can use the seaborn package do display a nice heatmap.

import seaborn as sns
sns.heatmap(pivoted_sorted , cmap="coolwarm")
Enter fullscreen mode Exit fullscreen mode

And here is what I got:


I find this heatmap very simple and self-explanatory. 2 regions stand out from the map. The red one, bottom left, and the dark blue one top right.

But first, because we are talking about times, we need to know what timezone we are talking about.
If you look carefully at the published_at": "2019-03-22T22:19:36.651Z, you will notice a Zat the end of the time string.
Well this Z indicates that this time string represents UTC time, or time zone Zero.

So, going back to our heatmap, we noticed that Monday to Wednesday afternoon (Monday and Wednesday morning for people on the east coast) are the more active zone on the map.
And, Saturday and Sunday are two very calms day, especially from midnight to noon.

So, here, at first sight, you could think that you better post those time to maximize your chances of having many likes. Well, we need to step back a little.

What this heatmap show is the time of the day where we observe the most likes in total. It does not take into account the fact that more posts automatically means more likes.

So maybe, right now we can't know for sure, the red zone we see on the heatmap just means that we observe more like on the platform only because more articles are being posted during those times.

This difference is critical, because what we are trying to know is the best time to post in order to maximize our likes, and this map can't help us.

So what we need it to make the same kind of map, but instead of counting the total of likes during one hour for each day we have to compute the mean of those numbers of likes.
We could also compute the median, I did it, and there is not much difference πŸ™‚.

Thanks to pandas, we only to change one small thing in our code:

# sum -> mean
aggregated_df = df.groupby(['day_of_week', 'hour'])['positive_reaction_count'].mean()

Enter fullscreen mode Exit fullscreen mode

And here is the new heatmap:

As you can see, the heat map is very different and much more exploitable than the previous one.
We now observe strip patterns. There is this wide blue one spanning from Monday to Sunday from 4 a.m to 10 a.m.
We also observe a peak of activity during the UTC afternoon.

What we can now state following this heatmap is that
articles posted during the afternoon, on average, had 10~20 more positive interactions than the one posted very early during UTC day.

I think it is all about the reader/writer ratio, and what those two heatmaps show is that even though there is much less reader during the weekend, there is also proportionally less writer. This is why an article published during the weekend will have the same numbers of interactions than an article published during the week.

Thank you for reading:

I hope you liked this post.

This series is far from over, I have plenty more information to show you related to this dataset.

Please tell in the comments if you want a particular aspect of data analyzed and don't forget to subscribe to my newsletter, there is more to come (And you'll also get the first chapters of my next ebook for free 😎).

If you want to continue reading about some python tips, go there, you could like it :).

If you like JS, I've published something you might like.

And if you prefer git, I got you covered.

Top comments (41)

falco467 profile image
falco467 • Edited

One should check for one more fallacy: what if the same great authors always post at the same time and the author is the deciding factor for number of likes and not time?

Two methods come to mind:

  1. Divide the likes of each post by the median number of likes of the author.
  2. just check if like count is related to userId over the whole data set.
daolf profile image
Pierre • Edited

Yes, you are totally right.

I think this kind of fallacy can also be caught by measuring the median number of like across the day instead of mean. I did it and did not find that much difference in the results.

I'll try your point 1, haven't thought of this one though.

And I plan to make an article about "famous" poster and their stat that will deal about point 2.

itr13 profile image
Mikael Klages

If we assume the opposite of "some people are great writers", namely "most people are bad writers", then there might be a situation where the majority make not good articles that aren't affected by time of day. In such a scenario using the median would be less telling.

That said, most posts here are pretty well-made, so that shouldn't be a problem. If it were the case, then the median would br pretty much even across all times of day, so if you didn't see much difference, then that disproves it.

moopet profile image
Ben Sinclair

If we think that "great authors" post at the same time then, even if we cater for numbers of users, we're basically assigning qualities to longitudes. One uncausal correlation from that would tell us we'd get more likes if we moved to a different country.

I think this is a whole lot like the rocket equation. The more people try to target "best times", the more they'll skew the data away from themselves.

wangonya profile image
Kelvin Wangonya

Nice catch there...

shindakun profile image
Steve Layton

Very cool! You can probably refactor it to use the "paged" articles. This will give you a set of 30 public article IDs per page so you don't have to scrape 94k IDs directly. You can just loop through the ~500 pages to grab the IDs. A cool thing about is since it's open source the API may not be documented but is available to view Can't wait to see more!

alfredosalzillo profile image
Alfredo Salzillo

The articles api seems to return only articles with the defaults tags career discuss productivity

shindakun profile image
Steve Layton

I'd have to double check but I believe that's only during "onboarding" when a user first signs up. If not that's likely a bug in my mind.

Thread Thread
alfredosalzillo profile image
Alfredo Salzillo

I've done a runkit how check if the most recent react article is published after the most recent article, if so, check if it is in the latest 30 articles.

The react article, published after the most recent article (of the isn't in the articles list.

It may be a bug of the api.

daolf profile image

Thanks, did not know about this endpoint. Will use it for my next one.

biros profile image
Boris Jamot ✊ /

Very nice meta article!

I did a similar job on where are users coming from :

ben profile image
Ben Halpern

Wow, I’m definitely going to read this when I get a chance. Skimmed it.

Initial thoughts are that it’s definitely affected by when we are awake and working because of how we schedule for Twitter etc. But we’re always evolving and modifying the process so this could change.

I can’t wait to read through this.

daolf profile image
Pierre • Edited

Thank you! I hope you'll find helpful insights.

kurisutofu profile image
kurisutofu • Edited

Really nice article.
A question though, shouldn't it include the age of articles in the factors to check?

Not sure it has a big impact but an older article would potentially get more likes than a recent one.

daolf profile image

While this concern sounds right, after thinking about it I think it is not much of a problem because:

1: By observation, I noticed that articles tend to only gather positive reactions during the first 4 days of posting
2: Even if that would not be the case, if we assume that the pattern had not changed in three years, then there are no problems giving more weight to older posts.

Those two assumptions do not seem very far-fetched.

And again, if we assume old post = more likes, we can always do the same heatmap with a median, which gives almost the same results.

ganesh profile image
Ganesh Raja

A valid concern.

murtezayesil profile image
Ali Murteza Yesil

These tables tell me a different story.

  1. Most of readers are in US.
  2. When I am active, it is Saturday 4am in US πŸ˜…

Here is a serious question: Are these results persistent across weeks and across months?

daolf profile image
Pierre • Edited

That is an interesting question,

I thought about making a gif of multiples heatmat across day / time / month hoping to discovere change of habit.

I'll post about it results are showing something interesting.

ivan profile image
Ivan Dlugos

I know this is an old article but I'm wondering - have you got around to do the comparison? I'm pretty sure there would be a noticeable difference due to daylight saving time shifts, if nothing else.

itr13 profile image
Mikael Klages

Fun project, coincidentally I did a heatmap project myself yesterday too.

I've written a discord bot for a server, and it stores the amount of messages sent per hour, so I made a heatmap over when the server is active.

To have it viewable online without using images, I decided to use plotly, which works great, but but can be a bit annoying to use.

daolf profile image

Write a post about it :) !

steelwolf180 profile image
Max Ong Zong Bao • Edited

Hmm.. One further analysis might be the use of tags. You can analyse the average tags on reactions per article and time of publishing.

This will be useful as well cause I based my articles on the tags that I would use which usually is the top 10-20 tags to optimise the numbers of reach and view for me.

daolf profile image

Thanks for your idea, I actually wanted to do this for my next one.

zenulabidin profile image
Ali Sherief

I always strive to publish my blog posts on Mondays. Spoiler alert: It's mainly for those lucrative week streak badges so it gives me more headroom to delay the article by a day or two if something goes wrong.

By contrast, I don't know if this site treads Sunday as the end of the week.

7tonshark profile image
Elliot Nelson

Great article!

I'd imagine well-written, popular articles have long tails -- they continue accumulating likes months or years after they are written.

It seems like the time of day matters less for the long-tail likes, as opposed to the initial burst of likes... although maybe time of day could be a catalyst (if the right people are skimming the recent articles and sharing with their social media for example, then time of day could be the difference between a long tail and no tail).

michaeltharrington profile image
Michael Tharrington

Love this post! ❀️

Just throwing this out here, but ya might want to switch out one of your tags for #meta as this is definitely an option - ... however, all of your tags are apt descriptions.

Again, totally digging this post!

daolf profile image

Thank you very much!

Great tag suggestion, just added it.

andersonjoseph profile image
Anderson. J

data is beautiful.

flexdinesh profile image
Dinesh Pandiyan

In my personal experience, posting Thursday morning - evening US time gets most traffic to your posts and increases the chance of making into 'weekly top posts'.

Two reasons mostly

  1. Before Thursday is too early and by end of the week, your post will already be at the bottom of the feed.
  2. After Friday is too late and even if your post quality is great, you wouldn't get enough traffic or reactions to reach the top posts curation that usually happens EOD or early Monday.

IMO I personally believe timing matters as much as the quality of the post to make it to 'top posts of the week'.

daolf profile image
Pierre • Edited

I agree that quality is the much important factor here, especially considering the amount of manual curation there is on the platform.

However, it is nice to see that likes are spread evenly during the week even though there are much more articles posted during the timeframe you mentioned.

desi profile image

I had just been wondering about this!

skrish2017 profile image

This is all sorts of nice. Thank you.