DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Cover image for Scraping 25,000 data points from Joe Biden's Instagram using instascrape
Chris Greening
Chris Greening

Posted on

Scraping 25,000 data points from Joe Biden's Instagram using instascrape

In this post, I'm going to discuss how I used my open source Instagram scraper to scrape 25,000 data points from Joe Biden's Instagram page.

Combining selenium and instascrape, I wrote a quick script that automatically scrolled Joe Biden's Instagram page and scraped the first 500 posts, yielding us almost 25,000 data points to explore (with 49 data points per post) πŸ™Œ.

Let's see what his likes per post looks like with a little matplotlib and scikit-learn magic 😏

Alt Text

As expected, we can see steady growth and then a massive spike upwards as election day approached.

Let's take a look at comments per post now for the heck of it:

Alt Text

There's a ton of different things we can do now that the data is available to us and it's really up to you what you do with it. Using the to_dict instance method, I can build a pandas.DataFrame from all of our data for easy analysis in a clean, expressive format. With a script like the following, we can get every post where Joe Biden used a hashtag.

dataframe[dataframe.hashtags.str.len() != 0]
Enter fullscreen mode Exit fullscreen mode

or say we wanted every post where Joe got more than 1,000,000 likes:

dataframe[dataframe["likes"] > 1000000]
Enter fullscreen mode Exit fullscreen mode

...so what are you waiting for? Get out there and start exploring Instagram data programatically!

If you're interested in reading more about instascrape, check out some of my other posts:

or better yet, come to the official repo and drop it a star and contribute ❀️

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

Version Downloads Release License

Activity Dependencies Issues

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that…

Top comments (2)

Collapse
 
shaikh profile image
Javed Shaikh

This is amazing.thanks for sharing πŸ™‚

Collapse
 
chrisgreening profile image
Chris Greening

Thanks so much Javed!! Glad you appreciated it <3

An Animated Guide to Node.js Event Loop

>> Check out this classic DEV post <<