DEV Community

loading...
Cover image for Instagram Scraping | The Top 2021 Guide [Updated]

Instagram Scraping | The Top 2021 Guide [Updated]

braydon_buckner_f0e01f8673a profile image Braydon Buckner ・5 min read

In this guide, you will find the best ways to scrape Instagram data. You can extract Instagram emails, phone numbers, images/photos, user bio, hashtags, etc.

To get the data you can scrape based on the followers of a specific Instagram account (your own or your competitors) or relevant hashtags.

Sit tight.

Instagram Scraper Tool - How to Build One Yourself

As you’ll see in this article, it’s money and time consuming to develop an Instagram scraper yourself so if you want to skip the whole process and get targeted and validated data use this Instagram email scraper.

Scraping Instagram data with the unofficial API

To access the unofficial Instagram API you need to use the mobile app since we’re after the mobile endpoints. If we intercept the traffic coming from Instagram’s servers and to we’ll be able to collect Instagram data.

Read more about how to access the API and scrape data from Instagram visit this Instagram scraper tool.

Instagram scraping with Python and GitHub

The above-mentioned mobile endpoints can be accessed with Python, PHP, or any other language that can log into Instagram profiles and extract data. While it’s only a couple of Python lines, there is so much more to it but here it goes.

To log into a profile and use the unofficial Instagram API use the code sample from this GitHub:

def login(self, force=False):
        """
            Authenticate this API instance.
            If already logged in (and not later logged out) does nothing (unless forced).
        :param force: if true, will attempt to log in even if already logged in.
        :return: dictionary of responses.
        """
        if not self._isloggedin or force:
            self._session = requests.Session()
            # if you need proxy make somethin://proxyip:proxyport"}
            full_response = self._sendrequest(
                'si/fetch_headers/?challenge_type=signup&guid=' + self.generate_uuid(False), login=True)
            data = {
                'phone_id': self.generate_uuid(True),
                '_csrftoken': full_response.cookies['csrftoken'],
                'username': self._username,
                'guid': self._uuid,
                'device_id': self._deviceid,
                'password': self._password,
                'login_attempt_count': '0'}
            try:
                full_response = self._sendrequest(
                    'accounts/login/',
                    post=self._generatesignature(json.dumps(data)),
                    login=True)
            except InstagramAPIBase._2FA_Required as exception:
                # In order to login, need to provide the second factor (i.e. SMS code or backup code).
                # Use call-back to get this string.
                if not self._two_factor_callback:
                    raise AuthenticationError("This account requires support for Two-Factor Authentication")
                two_factor_info = exception.two_factor_info = exception.two_factor_info
                verification_string = self._two_factor_callback(two_factor_info)
                data = {
                    'verification_code': verification_string,
                    'two_factor_identifier': g like this:
            # self.s.proxies = {"https": "httptwo_factor_info['two_factor_identifier'],
                    '_csrftoken': full_response.cookies['csrftoken'],
                    'username': self._username,
                    'device_id': self._deviceid,
                    'password': self._password,
                }
                full_response = self._sendrequest(
                    'accounts/two_factor_login/',
                    post=self._generatesignature(json.dumps(data)),
                    login=True)
            self._isloggedin = True
            decoded_text = json.loads(full_response.text)
            self._loggedinuserid = decoded_text["logged_in_user"]["pk"]
            self._ranktoken = "%s_%s" % (self._loggedinuserid, self._uuid)
            self._csrftoken = full_response.cookies["csrftoken"]
            return decoded_text
Enter fullscreen mode Exit fullscreen mode

From my experience, logging in is pretty straightforward. Use a proxy (if possible from the location you’re at) and fill out any captchas necessary. After the accounts rest for a few days you can start your scraping.

P.s. be extremely cautious with the login request because it can trigger Instagram’s algorithm to keep an eye on you.

Instagram scraping profiles for getting the data

While I started manually collecting emails and phone numbers from Instagram I now own 1500 Instagram profiles that simultaneously crawl the platform. And it’s not too much! You need a large number of accounts because the API call limit is very...limited. Meaning you can only gather so much data from one account.

So, if you’re serious about building an Instagram tool online with Python you have to remember two things:

  1. Never use your personal Instagram account
  2. And the Instagram profiles you purchase need to be aged and validated with a phone number (if you don’t want to get suspended)

Another thing to remember is that Instagram has gotten pretty smart in detecting fake accounts that are purchased at scale. Mainly they study the gray market and track the source.

My suggestion is to diversify and buy accounts from multiple sources and see which one is safe.

To keep the Instagram scraping safe use proxies

When you use proxies, Instagram no longer tracks your IP address but the address of the proxy used. While you can scrape from one server, you still need to make sure to keep the IP simulations low because logging in from more than 5 different IG accounts can be a huge problem.

Just as I previously mentioned the problem of the gray market and Instagram, the same holds true for proxies as well.

Instagram has the ability to detect millions of proxy providers, and you need to find the perfect one for long-term scraping.

Benefits and drawbacks of having your own Instagram scraper

It’s pretty hard creating your own tool that can extract Instagram user data, so let’s see whether building your own tool is viable or not.

Starting with the pros

  1. Having full control over the whole process
  2. You can use and reuse the email addresses and phone numbers
  3. And sell and resell the database

And the drawbacks of an in-house scraper are:

  1. No segmentation or additional targeting
  2. A lot of bot accounts scraped
  3. With it comes tons of invalid emails, spam traps, catch-alls

Skip the whole process and get an Instagram email list instantly?

There are data scraping companies that have been crawling Instagram for years. They’ve become so advanced that now they offer additional targeting options that you otherwise wouldn’t get.

Some of them are

  • Gender
  • Interest
  • Country/city
  • Age

These are advanced targeting options that you can’t get from scraping your competitors’ followers or a hashtag relevant to your brand. But they are crucial in a marketing campaign.

One provider that’s separating from the bunch is Influencers Club that offers all of these options + the data you get is validated i.e. safe to use. You can check out their pricing here.

Codes for Instagram email scraping (Python)

All code samples are from this GitHub repository.

After you’re logged in with an Instagram scraping account from a safe proxy you can start scraping emails.

The only mobile endpoints (API) you need are the ones below and the one for email addresses is

/api/v1/users/{{user_id}}/info/


User.public_email
Email address

user.username
The Username

user.is_private
If this is a private account

user.full_name
User’s full name

user.profile_pic_url
User’s profile photo URL

user.biography
User’s bio

user.external_url
User’s website

user.follower_count
Follower count

user.following_count
Following count

user.media_count
Number of posts
Enter fullscreen mode Exit fullscreen mode

The table shows the data points you can get with this code.

Create Instagram Phone Number Scraper Tool

Scraping phone numbers from Instagram uses the same API call as the one for email addresses. Of course, you can only find an Instagram phone number if the user has agreed to share it publicly.

/api/v1/users/{{user_id}}/info/
Enter fullscreen mode Exit fullscreen mode

According to my stats, only 10-15% of all Instagram users have their personal number publicly available which is close to 200 million users.

Final Thoughts on Instagram Scraping

In very few cases it’s actually ROI positive to build an Instagram scraping tool. It takes a lot of time and money to gather data, just to find out that the data is not really actionable.

I’m saying that because having Instagram data like email addresses will definitely ruin your cold email campaign because of high bounce rate.

Or if you plan on using the list in Facebook ads as custom audiences you can trigger their algorithm as you import a lot of invalid emails.

So maybe most of you should just visit one of these Instagram scraping companies that will make all of this safe and instant.

I recommend you check out Influencers Club.

Discussion (1)

pic
Editor guide
Collapse
gusbemacbe profile image
Gustavo Costa

Hello @braydon_buckner_f0e01f8673a

I'm not an influencer, but I want to scrape the whole list of followers of a person whom I hate, so I will use the old Instagram API to block all them from visiting and commenting on my profile because I find them undesirable, unwanted and unwished. The problem is that the person has 18,3 millions followers. How to deal with it if the limit is 50k?

I tried to Influencer Club, but there is not support or customer support.