In this post, I'm going to show you how you can use Python to visualize the relationship between Instagram variables using a scatter matrix. This will allow you to optimize your engagement and work towards beating the infamous algorithm!
We'll be working in three distinct steps with four distinct libraries:
Selenium: Automate a web browser to get the HTML
BeautifulSoup: Parsing and scraping HTML
Instascrape: Scrape the posts and load the data
Pandas: Organize and visualize the data
For the sake of brevity, I will mostly summarize the code for this part but a very similar full version can be found in this repo.
The script we use will:
- go to an Instagram page
- scroll it automatically
- gather the HTML at each scroll
- compare the HTML and find differences
One of the handy features that
After each scroll, we use
BeautifulSoup to get the unique shortcode of every post on the profile that we just scrolled. This can be used by
instascrape to construct a
instascrape.Post object for scraping with
Assuming we have created a
Post objects called
post_objects, we are now ready to scrape the data we need. Leveraging
instascrape, all we have to do to scrape each post is:
for post in post_objects: post.scrape()
And that's it! Each scrape loads a ton of data points with everything ranging from the amount of likes, hashtags used, tagged users, upload datetime, etc.
To get our data all neat and tidy, we're going to instantiate a pandas.DataFrame that will store our data:
import pandas as pd dataframe = pd.DataFrame([post.to_dict() for post in post_objects])
Now that we have an expressive and powerful way of handling our data, we can create some more useful columns with
dataframe["upload_hour"] = dataframe['upload_date'].dt.hour dataframe["upload_weekday"] = dataframe['upload_date'].dt.weekday dataframe["amt_tagged_users"] = dataframe['tagged_users'].str.len() dataframe["amt_hashtags"] = dataframe['hashtags'].str.len()
Now, to visualize it we use pandas.plotting.scatter_matrix which will let us view a matrix of scatter plots that show the different interactions between variables! For this example, we'll compare the
- hour of the upload
- day of the week
- amount of comments
- amount of likes
- amount of tagged users
- amount of hashtags
Using my own personal Instagram page (@chris_greening), we get:
pd.plotting.scatter_matrix(dataframe[['likes', 'comments', 'amt_tagged_users', 'upload_hour']], figsize=(8,8))
Analyzing this scatter matrix, we can now look at how different variables interact with one another and get an idea of what we can do to better boost our engagement 🙌
For example, looking at the scatter plot that compares
likes, we see a peak sometime around noon. This indicates that on average, the best time for me to post to my Instagram is around noon.
This is just one relationship and there are plenty more to be discovered! Let me know what other relationships you found interesting in the comments below ❤️
If you want to learn more about exploratory data analysis using
instascrape, check out my other blog posts
instascrape is always looking for more contributors, come join us at the official repo
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Here are a few of the things that…