DEV Community

Chris Greening

Posted on Nov 12, 2020

Compare major tech Instagram page's with instascrape

#showdev #python #datascience #contributorswanted

In this blog post, I'm going to compare some of the largest tech company's Instagram page's using my open source Python library instascrape! We'll be exploring their respective engagements, followers, amount of posts, etc. 🙌

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

…

View on GitHub

The companies we'll be comparing for this exercise are

Scraping the data

First, let's start by getting a list of their usernames:

companies = ["google", "apple", "ibm", "facebook", "microsoft", "adobe", "oracle"]

Now, scraping our data is as easy as

from instascrape import Profile 
profiles = [Profile(username) for username in companies]
for prof in profiles: 
    prof.scrape()

And that's it! We just scraped 364 data points from 7 profiles with just a few lines of code, let's use the to_dict method to get a list of dict's that can be passed into a pandas.DataFrame for expressive and powerful data analysis.

import pandas as pd 
data = [prof.to_dict() for prof in profiles]
df = pd.DataFrame(data)

Exploring our data

First, let's start by comparing how many followers each page has using a matplotlib bar plot:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["followers"])

We can immediately see that Apple clearly has the most followers and surprisingly, Facebook doesn't have as many as one might expect.

Now let's see who has the most amount of posts:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["posts"])

Finally, we're going to examine each page's engagement as a function of time and see how the different pages are doing

(NOTE: some of the specifics in the code are skipped so we can focus on what's important; additionally Apple will not be pictured as their data is significantly larger)

for prof in profiles:
    posts = prof.get_recent_posts()     #gets the 12 most recent posts
    posts_data = [post.to_dict() for post in posts]
    post_df = pd.DataFrame(posts_data)
    plt.plot(post_df.upload_date, post_df.likes, label=prof.username)

Some interesting things we can see right off the bat are:

Oracle barely gets any likes
Surprisingly neither does Facebook
Adobe, Google, and Microsoft post relatively frequently
IBM hasn't posted in almost two weeks
Microsoft gets the most likes on average on their posts

Conclusion

And that's pretty much it! This is just a small taste of what instascrape can accomplish and it's up to you with how you use it so get out there and start exploring that data!

If you like what you read, check out some of my other posts 😄

Scraping 25,000 data points from Joe Biden's Instagram using instascrape

Chris Greening ・ Nov 5 '20

#showdev #python #datascience #contributorswanted

Downloading recent Instagram photos using instascrape and Python

Chris Greening ・ Oct 26 '20

#python #webscraping #showdev #contributorswanted

Also, check out the official repository and drop it a star ⭐ or contribute!

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

What is it?

Key features

…

View on GitHub

Top comments (3)

gravesli • Nov 13 '20

back you.

Victor • Jun 11 '24

Hello. Great post. I've been reading all your posts related to instascrape. But something is troubling me. For some reason de created DataFrame end ups being completely empty. And that is the case with everything, followers = nan, username = nan, following = nan.

I am currently running the code in google collab. I made sure to (in collab):