DEV Community

loading...
Cover image for Visualizing Instagram engagement with instascrape and Python

Visualizing Instagram engagement with instascrape and Python

chrisgreening profile image Chris Greening Updated on ・3 min read

In a recent post, I introduced my open source Instagram web scraper instascrape as a lightweight means of collecting data from Instagram using Python!

For this post, I'm going to walkthrough an example using one of instascrape's recent additions: the ability to scrape an Instagram user's recent posts! With this data, we'll be able to visualize the trend in engagement for that user and see if their page is growing or declining 🙌.


Scraping the data

We'll be visualizing data from my Instagram page @chris_greening (shameless self promo 😉) but feel free to remove my username and replace it with your own 😬

Quick note: If you plan on following along, check out my recent post for details on pip installing or git cloning instascrape 😄

Now let's jump right in! To start, we'll import the Profile scraper and load the data from Instagram:

from instascrape import Profile 
chris = Profile('chris_greening')
chris.scrape()
recent_posts = chris.get_recent_posts()
Enter fullscreen mode Exit fullscreen mode

Out of the box, instascrape does not render any JavaScript so the only posts we get are the 12 most recent (it's built with flexibility in mind however and can be extended with selenium or similar)


Organizing the data

Now that we have the data, let's create a list of dict's that can easily be built into a pandas.DataFrame

import pandas as pd 

posts_data = [post.to_dict() for post in recent_posts]
posts_df = pd.DataFrame(posts_data)
print(posts_df[['upload_date', 'comments', 'likes']])
Enter fullscreen mode Exit fullscreen mode

which gives us

           upload_date  comments  likes
0  2020-10-16 14:39:41         8    119
1  2020-10-15 13:11:42        21    165
2  2020-10-14 12:36:21        16    150
3  2020-09-28 12:17:21         6    164
4  2020-09-27 09:27:00        14    210
5  2020-09-26 11:38:27        16    217
6  2020-09-25 10:18:28        17    227
7  2020-09-24 11:01:04        20    239
8  2020-09-17 17:49:18        15    279
9  2020-09-14 10:05:24        14    316
10 2020-09-09 10:24:17        13    244
11 2020-09-08 09:06:05        33    393
Enter fullscreen mode Exit fullscreen mode

Visualizing the data

Awesome! Now we can get to visualizing our data and see how the page is doing:

import matplotlib.pyplot as plt 

plt.style.use('seaborn-darkgrid')      # Stylistic change

plt.scatter(df.upload_date, df.likes)  # Plot the data
plt.xlabel('Upload Date')              # Write labels
plt.ylabel('Likes')
plt.title('@chris_greening Likes per Post')
plt.show()                             # Show graph 
Enter fullscreen mode Exit fullscreen mode

Alt Text

And that's it! As we can see, my Instagram is in fact trending downwards, yayyyy!... 😅

If you wanted to go further, you could use libraries such as scikit-learn and selenium to extend instascrape and fit regressors to dynamically loaded data for a more comprehensive visualization as shown below:

Alt Text

Let me know your thoughts in the comments below or even better, check out the repo on Github and contribute!

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and Beautiful

Discussion

pic
Editor guide
Collapse
gabrielarcangelbol profile image
Gabriel Arcangel Bol

Thanks for this post, i´ve tried most of your examples, however im getting this error; "['upload_date'] not in index".

When i see the columns i found the following ones:

Index(['csrf_token', 'viewer', 'viewer_id', 'country_code', 'language_code',
'locale', 'device_id', 'browser_push_pub_key', 'key_id', 'public_key',
'version', 'is_dev', 'rollout_hash', 'bundle_variant', 'frontend_dev',
'id', 'shortcode', 'height', 'width', 'gating_info',
'fact_check_overall_rating', 'fact_check_information',
'sensitivity_friction_info', 'media_overlay_info', 'media_preview',
'display_url', 'accessibility_caption', 'is_video', 'tracking_token',
'tagged_users', 'caption', 'caption_is_edited', 'has_ranked_comments',
'comments', 'comments_disabled', 'commenting_disabled_for_viewer',
'timestamp', 'likes', 'location', 'viewer_has_liked',
'viewer_has_saved', 'viewer_has_saved_to_collection',
'viewer_in_photo_of_you', 'viewer_can_reshare', 'video_url',
'has_audio', 'video_view_count', 'username', 'full_name'],
dtype='object')

I guess that 'timestamp' is the right one to use, instead of 'upload date'

Please let me know if im missing something here.

Collapse
chrisgreening profile image
Chris Greening Author

Hey Gabriel! First of all, thanks so much for followin along!

Looks like you discovered a lil bug that I'm gonna go fix right now, thank you for bringing this to my attention!!! Since writing this post, the implementation of get_recent_posts has changed and it looks like I forgot to include the timestamp to upload_date conversion. Instagram only serves back an integer timestamp that instascrape then converts to a datetime object and embarrassingly I seem to have forgotten to write that back in after the update lol

Collapse
chrisgreening profile image
Chris Greening Author

ok my friend, the bug should be fixed! I merged the fix with the repo and am pushing it to PyPI under version 1.3.3 as we speak. Thanks for the find!

Collapse
gabrielarcangelbol profile image
Gabriel Arcangel Bol

Thank you so much for your time and fast reply. I´m doing a project to make some data analysis through any Instagram scrape tool. I came across to a RapidApi instagram API, but i haven´t figured out yet how to get the data from the request module. So, it was great to find your your api, because its easy to use. If you don´t mind i would let you know about my findings.

Thread Thread
chrisgreening profile image
Chris Greening Author

I'd love to hear about what you come up with! I actually just opened a discussion board about an hour ago on the repo, feel free to post about your project/ask questions about instascrape on there!

Thread Thread
gabrielarcangelbol profile image
Collapse
villival profile image
villival

easy to use... perfect for data analysis

Collapse
chrisgreening profile image
Chris Greening Author

Thanks so much!!! One of the primary inspirations for this project was easy to use data scraping <3

Collapse
villival profile image
villival

wonderful efforts