DEV Community

loading...
Cover image for Compare major tech Instagram page's with instascrape

Compare major tech Instagram page's with instascrape

chrisgreening profile image Chris Greening ・3 min read

In this blog post, I'm going to compare some of the largest tech company's Instagram page's using my open source Python library instascrape! We'll be exploring their respective engagements, followers, amount of posts, etc. 🙌

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and Beautiful

The companies we'll be comparing for this exercise are

Scraping the data

First, let's start by getting a list of their usernames:

companies = ["google", "apple", "ibm", "facebook", "microsoft", "adobe", "oracle"]
Enter fullscreen mode Exit fullscreen mode

Now, scraping our data is as easy as

from instascrape import Profile 
profiles = [Profile(username) for username in companies]
for prof in profiles: 
    prof.scrape()
Enter fullscreen mode Exit fullscreen mode

And that's it! We just scraped 364 data points from 7 profiles with just a few lines of code, let's use the to_dict method to get a list of dict's that can be passed into a pandas.DataFrame for expressive and powerful data analysis.

import pandas as pd 
data = [prof.to_dict() for prof in profiles]
df = pd.DataFrame(data)
Enter fullscreen mode Exit fullscreen mode

Exploring our data

First, let's start by comparing how many followers each page has using a matplotlib bar plot:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["followers"]) 
Enter fullscreen mode Exit fullscreen mode

Alt Text

We can immediately see that Apple clearly has the most followers and surprisingly, Facebook doesn't have as many as one might expect.

Now let's see who has the most amount of posts:

import matplotlib.pyplot as plt 
plt.style.use("seaborn-darkgrid")
plt.bar(df["username"], df["posts"]) 
Enter fullscreen mode Exit fullscreen mode

Alt Text

Finally, we're going to examine each page's engagement as a function of time and see how the different pages are doing

(NOTE: some of the specifics in the code are skipped so we can focus on what's important; additionally Apple will not be pictured as their data is significantly larger)

for prof in profiles:
    posts = prof.get_recent_posts()     #gets the 12 most recent posts
    posts_data = [post.to_dict() for post in posts]
    post_df = pd.DataFrame(posts_data)
    plt.plot(post_df.upload_date, post_df.likes, label=prof.username)
Enter fullscreen mode Exit fullscreen mode

Alt Text

Some interesting things we can see right off the bat are:

  • Oracle barely gets any likes
  • Surprisingly neither does Facebook
  • Adobe, Google, and Microsoft post relatively frequently
  • IBM hasn't posted in almost two weeks
  • Microsoft gets the most likes on average on their posts

Conclusion

And that's pretty much it! This is just a small taste of what instascrape can accomplish and it's up to you with how you use it so get out there and start exploring that data!

If you like what you read, check out some of my other posts 😄

Also, check out the official repository and drop it a star ⭐ or contribute!

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and Beautiful

Discussion

pic
Editor guide
Collapse
gravesli profile image
gravesli

back you.