Chris Greening

Posted on Oct 21, 2020 • Edited on May 3, 2022

Visualizing Instagram engagement with instascrape and Python

#python #datascience #opensource #hacktoberfest

In a recent post, I introduced my open source Instagram web scraper instascrape as a lightweight means of collecting data from Instagram using Python!

Scrape data from Instagram with instascrape and Python

Chris Greening ・ Oct 20 '20

#python #datascience #hacktoberfest #opensource

For this post, I'm going to walkthrough an example using one of instascrape's recent additions: the ability to scrape an Instagram user's recent posts! With this data, we'll be able to visualize the trend in engagement for that user and see if their page is growing or declining 🙌.

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

christophergreening.com

Scraping the data

We'll be visualizing data from my Instagram page @chris_greening (shameless self promo 😉) but feel free to remove my username and replace it with your own 😬

Quick note: If you plan on following along, check out my recent post for details on pip installing or git cloning instascrape 😄

Now let's jump right in! To start, we'll import the Profile scraper and load the data from Instagram:

from instascrape import Profile 
chris = Profile('chris_greening')
chris.scrape()
recent_posts = chris.get_recent_posts()

Out of the box, instascrape does not render any JavaScript so the only posts we get are the 12 most recent (it's built with flexibility in mind however and can be extended with selenium or similar)

Organizing the data

Now that we have the data, let's create a list of dict's that can easily be built into a pandas.DataFrame

import pandas as pd 

posts_data = [post.to_dict() for post in recent_posts]
posts_df = pd.DataFrame(posts_data)
print(posts_df[['upload_date', 'comments', 'likes']])

which gives us

           upload_date  comments  likes
0  2020-10-16 14:39:41         8    119
1  2020-10-15 13:11:42        21    165
2  2020-10-14 12:36:21        16    150
3  2020-09-28 12:17:21         6    164
4  2020-09-27 09:27:00        14    210
5  2020-09-26 11:38:27        16    217
6  2020-09-25 10:18:28        17    227
7  2020-09-24 11:01:04        20    239
8  2020-09-17 17:49:18        15    279
9  2020-09-14 10:05:24        14    316
10 2020-09-09 10:24:17        13    244
11 2020-09-08 09:06:05        33    393

Visualizing the data

Awesome! Now we can get to visualizing our data and see how the page is doing:

import matplotlib.pyplot as plt 

plt.style.use('seaborn-darkgrid')      # Stylistic change

plt.scatter(df.upload_date, df.likes)  # Plot the data
plt.xlabel('Upload Date')              # Write labels
plt.ylabel('Likes')
plt.title('@chris_greening Likes per Post')
plt.show()                             # Show graph

And that's it! As we can see, my Instagram is in fact trending downwards, yayyyy!... 😅

If you wanted to go further, you could use libraries such as scikit-learn and selenium to extend instascrape and fit regressors to dynamically loaded data for a more comprehensive visualization as shown below:

Let me know your thoughts in the comments below or even better, check out the repo on Github and contribute!

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

…

View on GitHub

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

christophergreening.com

Top comments (23)

Gabriel Arcangel Bol • Dec 8 '20

Thanks for this post, i´ve tried most of your examples, however im getting this error; "['upload_date'] not in index".

When i see the columns i found the following ones:

Index(['csrf_token', 'viewer', 'viewer_id', 'country_code', 'language_code',
'locale', 'device_id', 'browser_push_pub_key', 'key_id', 'public_key',
'version', 'is_dev', 'rollout_hash', 'bundle_variant', 'frontend_dev',
'id', 'shortcode', 'height', 'width', 'gating_info',
'fact_check_overall_rating', 'fact_check_information',
'sensitivity_friction_info', 'media_overlay_info', 'media_preview',
'display_url', 'accessibility_caption', 'is_video', 'tracking_token',
'tagged_users', 'caption', 'caption_is_edited', 'has_ranked_comments',
'comments', 'comments_disabled', 'commenting_disabled_for_viewer',
'timestamp', 'likes', 'location', 'viewer_has_liked',
'viewer_has_saved', 'viewer_has_saved_to_collection',
'viewer_in_photo_of_you', 'viewer_can_reshare', 'video_url',
'has_audio', 'video_view_count', 'username', 'full_name'],
dtype='object')

I guess that 'timestamp' is the right one to use, instead of 'upload date'

Please let me know if im missing something here.

Chris Greening • Dec 8 '20

Hey Gabriel! First of all, thanks so much for followin along!

Looks like you discovered a lil bug that I'm gonna go fix right now, thank you for bringing this to my attention!!! Since writing this post, the implementation of get_recent_posts has changed and it looks like I forgot to include the timestamp to upload_date conversion. Instagram only serves back an integer timestamp that instascrape then converts to a datetime object and embarrassingly I seem to have forgotten to write that back in after the update lol

Chris Greening • Dec 8 '20

ok my friend, the bug should be fixed! I merged the fix with the repo and am pushing it to PyPI under version 1.3.3 as we speak. Thanks for the find!

Gabriel Arcangel Bol • Dec 8 '20

Thank you so much for your time and fast reply. I´m doing a project to make some data analysis through any Instagram scrape tool. I came across to a RapidApi instagram API, but i haven´t figured out yet how to get the data from the request module. So, it was great to find your your api, because its easy to use. If you don´t mind i would let you know about my findings.

Chris Greening • Dec 8 '20

I'd love to hear about what you come up with! I actually just opened a discussion board about an hour ago on the repo, feel free to post about your project/ask questions about instascrape on there!

Gabriel Arcangel Bol • Dec 8 '20

Nice, thanks!

Raghav Vasudeva • Jan 21 '21

This error

C:\Users\dell\anaconda3\lib\site-packages\instascrape\core_static_scraper.py:134: MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise
Instagram will likely redirect you to their login page.
warnings.warn(
upload_date comments likes
0 1609352716 NaN NaN
1 1609262155 NaN NaN
2 1609098057 NaN NaN
3 1609010932 NaN NaN
4 1608314370 NaN NaN
5 1608227544 NaN NaN
6 1607624673 NaN NaN
7 1607115440 NaN NaN
8 1605468666 NaN NaN
9 1605464706 NaN NaN
10 1605457109 NaN NaN
11 1605199872 NaN NaN

Chris Greening • Jan 21 '21

Good find! I recently had to reimplement some of the code that's used in this tutorial, I'll fix it right now and get back to you in a bit once the patch is pushed through, thank you for bringing this to my attention!

Chris Greening • Jan 21 '21

Alright, the patch has been uploaded, reinstall the library with pip install instascrape==2.1.1 and you should be good to go! Thanks again

Raghav Vasudeva • Jan 21 '21

*update
reinstall with the following
pip install insta-scrape==2.1.1

Raghav Vasudeva • Jan 21 '21

Thanks for the quick reply!

Anacleto Berencelli • Jan 21 '21

Hey, Chris!
Great scraper. But I'm getting the following error right after installing it running...
!pip install insta-scrape
from instascrape import Profile

File "/usr/local/lib/python3.6/dist-packages/instascrape/scrapers/profile.py", line 1
from future import annotations
^

SyntaxError: future feature annotations is not defined

Please let me know what I'm missing here.

Chris Greening • Jan 21 '21

Hey thanks so much for checking out the lib!

Based on your traceback, it looks like you're running Python 3.6 and from futures import annotations is only available in >=3.7! Hope this helps 😄

Anacleto Berencelli • Jan 21 '21

Thanks!!

Leon Avalos • Jul 19 '22

Looks great, however i'm getting this error when trying to call profille.scrape():

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Thanks!!

Prasannjeet Singh • Jul 27 '22

Facing the exact same problem. I was wondering if you were able to fix it?

villival • Dec 14 '20

easy to use... perfect for data analysis

Chris Greening • Dec 14 '20

Thanks so much!!! One of the primary inspirations for this project was easy to use data scraping <3

villival • Dec 14 '20

wonderful efforts

DOUELFAKAR • Jun 4 '21

Thanks a lot for this post Chris, i encounter an issue when I am on my home internet but not when I am connected with my phone or outside. How is it possible ?
Here is the error message I get:

instascrape.exceptions.exceptions.InstagramLoginRedirectError: Instagram is redirecting you to the login page instead of the page you are trying to scrape. This could be occuring because you 1. made too many requests too quickly or 2. are not logged into Instagram on your machine.