DEV Community

loading...
Cover image for Scraping 25,000 data points from Joe Biden's Instagram using instascrape

Scraping 25,000 data points from Joe Biden's Instagram using instascrape

chrisgreening profile image Chris Greening ・2 min read

In this post, I'm going to discuss how I used my open source Instagram scraper to scrape 25,000 data points from Joe Biden's Instagram page.

Combining selenium and instascrape, I wrote a quick script that automatically scrolled Joe Biden's Instagram page and scraped the first 500 posts, yielding us almost 25,000 data points to explore (with 49 data points per post) 🙌.

Let's see what his likes per post looks like with a little matplotlib and scikit-learn magic 😏

Alt Text

As expected, we can see steady growth and then a massive spike upwards as election day approached.

Let's take a look at comments per post now for the heck of it:

Alt Text

There's a ton of different things we can do now that the data is available to us and it's really up to you what you do with it. Using the to_dict instance method, I can build a pandas.DataFrame from all of our data for easy analysis in a clean, expressive format. With a script like the following, we can get every post where Joe Biden used a hashtag.

dataframe[dataframe.hashtags.str.len() != 0]
Enter fullscreen mode Exit fullscreen mode

or say we wanted every post where Joe got more than 1,000,000 likes:

dataframe[dataframe["likes"] > 1000000]
Enter fullscreen mode Exit fullscreen mode

...so what are you waiting for? Get out there and start exploring Instagram data programatically!

If you're interested in reading more about instascrape, check out some of my other posts:

or better yet, come to the official repo and drop it a star and contribute ❤️

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and Beautiful

Discussion

pic
Editor guide
Collapse
shaikh profile image
Javed Shaikh

This is amazing.thanks for sharing 🙂

Collapse
chrisgreening profile image
Chris Greening Author

Thanks so much Javed!! Glad you appreciated it <3