DEV Community

Cover image for Instagram Scraper 101: How to scrape Instagram posts, comments…
LouiseeLambertf
LouiseeLambertf

Posted on

Instagram Scraper 101: How to scrape Instagram posts, comments…

Does any data on Instagram appeal to you, and you want to extract them on a large scale from the platform? Then scraping is the only way out. Come in now to discover the best Instagram data Scraper in the market – and how to build yours.

Instagram Scraper Instagram, the popular photo, and video-sharing social media platform owned by Facebook is a huge source of social data. Unlike Facebook, Instagram does not hold as much personal data as Facebook does. However, the wealth of other information that still has a personal touch to it is overwhelming, especially among millennia. Data of interest on Instagram includes user profiles, posts (images and videos) – and their associated comments. Social researchers and businesses are in dare need of these data for their analysis in other to fine-tune their workflow, better understand their audience, create better content, and carry out other researches.

However, the official Instagram API only provides you access to your own Instagram data with good number restrictions in terms of API calls and data limits. If you must access publicly available data not tied to your own account, then you must work outside the confinement of the official Instagram API, and this means making use of automation tools known as Instagram scrapers. An Instagram scraper is a computer program that automates the process of extracting data from the Instagram platform. It does so by sending HTTP requests to web pages of interest in other to download them, parse the required data out of the page – and save it to a database if necessary.

This article will recommend the best Instagram scrapers in the market to you and also show you how to build one for yourself if you know how to code. Before that, let take a look at an overview of scraping Instagram.


Instagram Scraping – an Overview

Instagram is very clear on the use of the scraper, crawlers, and other automation bots on its platform. According to what is contained in the Instagram term of usage, the use of web scrapers on its platform is prohibited. Despite this, people are still actively scraping data from Instagram – and you can’t blame them; the official Instagram API isn’t helping matters. However, that people are not scraping Instagram does not mean you will be able to do that. Instagram has one of the most strict, effective, and intelligent anti-bot system in place to prevent automated access and traffic on their platform. Instagram Scrapers They have been at the forefront of fighting bots in the industry, shutting down a good number of services such as the popular Mass Planner. Being that as it may, with the right system in place, you can scrape data from the Instagram platform at any scale without being detected and blocked.

The most important tool you have to take care of is proxies. Yes, Instagram tracks IPs and is very smart at detecting proxies, and as such, mobile proxies are the proxies of choice. However, if you can’t afford them, you can use residential proxies.


How to Scrape Instagram using Python and Selenium

Except you can reverse engineer the Instagram mobile application, your focus should be on the Instagram web application as that’s the one you can easily replicate its requests. The Instagram web application was built heavily with JavaScript to provide you a near-native and responsive experience, and as such, you have a lot of XHR and AJAX requests to deal with.

This makes the duo of Requests and Beautifulsoup not suitable for scraping Instagram. You need a way of rendering and executing JavaScript, which headless browsers can. As a python developer, Selenium is the most popular and powerful browser automation tool you can use to control browsers in headless mode. [su_youtube url="https://www.youtube.com/watch?v=4UqQt7dF9a8"] As you already know, there are some data available publicly on Instagram you can access even without logging in. These include profiles, posts, hashtags, comments, and places. I will advise you to focus on this and others that won’t require a login. You know why?

Accessing Instagram with an automation tool while logged in makes it easy for the anti-bot system to sniff you out, and when that happens, you risk not only getting your IP blacklisted but also your account banned. I know you can create accounts to use for your scraping work, but you also need to be good at engineering your bot to evade the check activated on logged-in accounts and their activities.

Below is a small Instagram scraper for scraping comments under posts. It is a simple proof of concept scraper and built using Python and Selenium to show you how easy it is building and Instagram scraper.

from selenium import webdriver

class InstagramScraper:

    def __init__(self, post_url):
        self.post_url = post_url
        self.comments = []
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument("--headless")
        self.chrome = webdriver.Chrome(chrome_options=chrome_options)
    def scrape_comments(self):
        browser = self.chrome.get(self.post_url)
        content = self.chrome.page_source
        comments = 
self.chrome.find_element_by_class_name("XQXOT").find_elements_by_class_name("Mr508")
        for comment in comments:
            d = 
comment.find_element_by_class_name("ZyFrc").find_element_by_tag_name("li").find_elemen
t_by_class_name("P9YgZ").find_element_by_tag_name("div")
            d = d.find_element_by_class_name("C4VMK")
            poster = d.find_element_by_tag_name("h3").text
            post = d.find_element_by_tag_name("span").text
            self.comments.append({
                "poster": poster,
                "post": post
            })

        return self.comments
    
post_url = "https://www.instagram.com/p/CAbDmzDnSvn/"
x = InstagramScraper(post_url)
x.scrape_comments()

Best Instagram Scrapers

Even without being a coder, you can still access the data you require on Instagram by using already-made Instagram scrapers in the market. What you should be mindful of is choosing the best tool for the job. Also, you need to make sure you configure the bot you choose correctly else; you will still get detected and blocked. Below are the 5 best Instagram scrapers you can use for your Instagram data scraping tasks.


Octoparse

Octoparse

  • Pricing: Starts at $75 per month
  • Free Trials: 14 days of free trial with limitations
  • Data Output Format: CSV, Excel, JSON, MySQL, SQLServer
  • Supported Platform: Cloud, Desktop

Looking for a very reliable, tested, and trusted web scraper to use for your Instagram data scraping? Then Octoparse should be on the list of the option. You know why? It has Instagram scraping templates, which will make the whole process of scraping quite easier and faster.

Octoparse, just like all the other tools above (excluding Apify Instagram Scraper), is a visual scraping tool that requires no coding skill to use. Octoparse is available as both a cloud-based tool as well as installable desktop software.  It has a free trial option you can try before making a monetary commitment, but you can be sure that Octoparse works. Octoparse Instagram Scrapers


Jarvee

Jarvee

  • Pricing: Starts at $29.95 per month
  • Free Trials: 5 days of free trials
  • Data Output Format: JSON, CSV, Excel
  • Supported Platforms: Desktop - Windows

For those that are into Instagram automation, they will know the capabilities and Jarvee – it remains one of the best and most powerful tools that has survived updates meant to discourage botting. The good news is, it is also one of the best tools you can use for scraping data from Instagram.

You just have to look for the best settings and make sure you know what you are doing as Jarvee allows you to take full control, which can mean going overboard – Check out this official tutorial from Jarvee to learn how to set it up for scraping Instagram. Jarvee is not an Instagram only tool – it works for other social media platforms. It is a paid Windows-based tool. Jarvee for Instagram Scrapers


Apify Instagram Scraper

Apify Logo

  • Pricing: Starts at $49 per month for 100 Actor compute units
  • Free Trials: Starter plan comes with 10 Actor compute units
  • Data Output Format: JSON
  • Supported Platforms: cloud-based – accessed via API

Apify is a platform that hosts a good number of web automation tools known as actors with the Instagram Scraper as one of such tools. The Apify Instagram Scraper can help you extract publicly available data from Instagram, such as posts on profiles, comments, places, and hashtags. The tool even provides support for search queries – and you can provide it a list of URLs too.

One thing I like about Apify as a platform is that all of its automation tools (including Instagram Scraper are all in the form of an API, and as such, it is easy to integrate them into your custom programs. You can also decide to save scraped data in excel or CSV files. Apify Instagram Scraper


Webscraper.io Chrome Extension

webscraper io

  • Pricing: Browser extension is free
  • Free Trials: Browser extension is free
  • Data Output Format: CSV
  • Supported Platform: Chrome extension

Webscraper.io has proven to be one of the best web scraper available as a browser extension. With this tool, you can scrape any website – both old and new as it has been developed for the modern web.

This extension can be used for scraping Instagram as it renders JavaScript perfectly and takes care of the Instagram infinite scroll issue that you might experience. Webscraper.io, unlike the other two above, is a free tool when used as a browser extension. However, there is some limitation – and cloud scraping removes those limitations but requires you to pay. webscraper overview


ScrapeStorm

Scrapestorm Logo

  • Pricing: Starts at $49.99 per month
  • Free Trials: Starter plan is free – comes with limitations
  • Data Output Format: TXT, CSV, Excel, JSON, MySQL, Google Sheets, etc.
  • Supported Platforms: Desktop

ScrapeStorm is another web scraper that can handle scraping publicly available data on Instagram very well. ScrapeStorm is actually a general web scraping that can be used for scraping any website on the Internet. It scrapes websites undetectably and scraped for you what users can see. What makes ScrapeStorm unique from every other one on the list is that it requires no training as it detects data points intelligently on its own using Artificial Intelligence. ScrapeStorm is available on most of the popular Operating systems and also can be used as a cloud-based tool. It is a paid tool with a trial option available. ScrapeStorm Instagram Scrapers


Conclusion

Instagram remains one of the most difficult websites to scrape on the Internet as it has a strong mechanism in place to prevent botting. However, experienced developers still get it scrapped, evading the anti-scraping techniques put in place by Instagram. If you aren’t experienced enough to develop scrapers that can scrape Instagram, you can make use of one of the Instagram scrapers discussed above for scraping data from Instagram.

Top comments (1)