DEV Community

Manan Gouhari
Manan Gouhari

Posted on

How do you scrape Instagram?

Recently, there have been talks of Instagram closing down its API and leaving access to only corporate partners.
Data scraping becomes even more important in this scenario because of Instagram's large user base. Instagram is a platform full of data in its every nook and cranny.
I decided to start by scraping whatever data we can find on a person's account page, which you can access at https://instagram.com/

Let's take a look at my page for example at https://instagram.com/manan.code
Profile
This is the main area I am interested in, what all could we scrape from here and how? Right-click on the page and click view page source to see the source file behind it.
You'll see something like this -
Source
Now at first look, this seems incomprehensible and it seems almost impossible to find any data from this, it's just sea of link and script tags.
But the data is there somewhere for sure.
I did some digging and found out the script tag that consists basically everything we need.
script
Now that we know where the data is, let's move on to the code.
We'll use the requests module and BeautifulSoup.


So till this point in the code, we've requested Instagram and got the source, after that we've converted it to a BeautifulSoup object to make it easy to find the script tag we need. After converting it to BeautifulSoup object, we've used the find_all function in the BeautifulSoup library and found all the script tags, by a little trial and error, I discovered, the script tag we need is the 5th one, so we index it appropriately and find the script tag we need.
But, we need to do one more thing, right now what we have is not a string, we can't slice it to find what we need. Hence, we access the contents of the script tag.
The next step is to find out where's the part we need.

Now what we've done is, if you remember, the javascript object having all the data started from {"config":, I've simply used a little string processing to slice out the whole javascript object and having it isolated, convert it to a JSON object using loads from the json package in the standard library.
If you print data_json, this is what you get -
Alt Text
On looking closely, I figured out all the right keys to the data we need, here is the result.

Final
and this marks the end of our journey to scraping Instagram!
Check out my video where I go over the same thing -

Top comments (8)

Collapse
 
chrisgreening profile image
Chris Greening

Hello!!! I've been working on an open source library for scraping Instagram data you might find interesting at this repo. It scrapes the same JSON data you explored in your blog post with steps as easy as

from instascrape import Profile
manan = Profile.from_username('manan.code')
manan.load()
Enter fullscreen mode Exit fullscreen mode

and that's it!! It scraped almost all the data points you can get from the data_json you were exploring. It has similar functionality for hashtags and posts as well, check it out :)

Collapse
 
manangouhari profile image
Manan Gouhari

Oh that is great! I'll look into it for sure. Thanks for commenting about it.

Collapse
 
restyler profile image
restyler

Great writeup, thank you! I've just published a simple tutorial on Instagram scraping and discovering micro-influencers via SQL.


I will appreciate your feedback and comments.
Cheers!
Collapse
 
nityagudi18 profile image
nityagudi18

Hi,
Thanks for sharing this video.
I see that some accounts have the data we need at script tag 4, not 5. I am trying to scrape data for a list of users. How to extract search tag index and make this work for all users?

Collapse
 
electronlab profile image
Harsh Vardhan Goswami

Great boi

Collapse
 
bobbymoure profile image
BobbyMoure

How to understand scratching? Can I scratch Instagram?)

Collapse
 
manangouhari profile image
Manan Gouhari

WUT EVEN DUDE ?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.