Note: The results you see below may not be 100% correct. This analysis is based on a dataset that has approximately 6500 blog-posts.
Hello, Welcome to this post.
I have been blogging for quite a time now, here on dev. I planned to do this analysis on Medium blogs because it can help to answer some questions like
- What day is good to post the blog
- What should be the ideal reading time
- which image extensions to use in the blogs
- publication with the most number of blogs
- average reading time of articles according to the publications
Here is the link for kaggle version of this notebook you can find the dataset and everything here. Feel free to fork and use/modify the code with your own dataset to see interesting results. this dataset contains information about randomly chosen medium articles published in 2019 from these 7 publications: Towards Data Science, UX Collective, The Startup, The Writing Cooperative, Data-Driven, Investor, Better Humans, Better Marketing
From this heatmap we can say that most of the blogs were posted on Monday and among all the months around 500 articles were published on Monday in the month of October. If we consider two weekdays where the number of articles published highest we get
So the next time when you publish your article, try to do it on Monday or Thursday hopefully it will get seen by more number of audience.
Now that we understood what is the best day to post an article, let's try to get understand how long should the article be.
we have plotted the graph with the number of claps/likes vs the reading time of the article. The assumption is that a person will give a clap to an article only when he reads it completely and finds it useful/entertaining.
Along with the claps/likes, the comments are also the important factors in the article writing because they help us to understand the quality of the blog i.e is it well-written / it's plagiarised / it is helpful / conversation starter, etc. and you can also do further sentimental analysis on it to understand if it's good or bad. let's plot another scatterplot for it.
So we can say that the ideal reading time for an article should be 5 - 10 minutes. And if the article is too long there is a possibility that the user may not read it as it will take a lot of their time and hence will not comment or clap to it.
As we see that most of the images are of type jpeg i.e about 50% and then jpg and png. There is gonna be some reason behind this, let's see what it is
JPEG - JPEG is a lossy compression method used to ensure the digital images being used are as small as possible and load quickly when someone wants to view them. Here are some important points about it
The file size of the image being compressed is permanently reduced by eliminating unnecessary (redundant) information from the image.
Image quality does suffer, though it’s often so slight the average site visitor can’t tell.
JPG - Well, when it comes to .jpeg vs .jpg, the truth is there is no difference between the two except the number of characters.
The term JPG exists because of the earlier versions of Windows operating systems. Specifically, the MS-DOS 8.3 and FAT-16 file systems had a maximum 3-letter limit when it came to filing names, unlike the UNIX-like operating systems like Mac or Linux, which didn’t have this limit.
I read these points in this article, you can check it out if you want to learn more about it
well coming to png's
PNG - Portable Network Graphics (PNGs) are just as popular as JPEGs on websites. They also support millions of colors, although you’re much better off using PNGs for images that contain fewer color data. Otherwise, your image is going to be ‘heavier’ than the same image saved as a JPEG.
so we can conclude that
- JPEG/JPG: This is an ideal image format for all types of photographs.
- PNG: This format is perfect for screenshots and other types of imagery where there’s not a lot of color data.
- GIF: If you want to show off animated graphics on your site, this is the best image format for you.
The Startup publication has the most number of articles i.e. around 3000 and Better Humans has the least number of articles published on medium i.e. around 10.
We can also Infer that most of the people read articles from The Startup, Towards Data Science, and Data-Driven Investor assuming they have posted a huge number of articles based on the demand/response from the audience.
(Note: we cannot be sure about this Inference because the dataset we have might not be the complete data)
The startup's mission: to help readers get smarter at building their things;
we can fairly assume that help readers get smarter at building their things may be a hot topic for blog
Now that we know the publications with the most number of blogs, let's look at the average reading time of articles according to the publications
As we know that The Startup wrote the most number of articles their average time is 5.9min. This again backs up our facts that the ideal time is 5-10 minutes.
Similarly, we can also see reading time for the article with the highest claps among the publications, and here are the results.
|Better Marketing||7 min|
|Towards Data Science||11 min|
|UX collective||5 min|
|Data Driven Investor||6 min|
|The Startup||5 min|
|The writing Cooprative||5 min|
|Better Humans||5 min|
Even here popular reading time is bet 5-10 min.
The things we explored are just an idea you can do a lot more with this analysis with even better datasets. Let me know in the comments if you plan to do something similar, let's collaborate.
Well, That's all for the post folks.
Hope you enjoyed it and hopefully, some insights will be useful for your next blog.