DEV Community

Cover image for The Easy Way to Scrape Instagram Using Python Scrapy & GraphQL

The Easy Way to Scrape Instagram Using Python Scrapy & GraphQL

Ian Kerins on August 06, 2020

After e-commerce monitoring, building social media scrapers to monitor accounts and track new trends is the next most popular use case for web scra...
Collapse
 
drakula2k profile image
Vlad

Looks like Instagram doesn't work via Scraper API anymore. But it still works on webscraping.ai

Collapse
 
djk50 profile image
djk50

instagram.com/explore/tags//

Do you know how to get the posts for a tagged username. Simply replacing the former URL with this new one doesn't seem to work.

Collapse
 
drakula2k profile image
Vlad

instagram.com/explore/tags/sport/?... works for hashtags, and instagram.com/nike/tagged/?__a=1 works for username, but this one requries login

Collapse
 
karisjochen profile image
karisjochen

Do you mind sharing how you adjusted the code to use webscraping.ai instead? Thanks!

Collapse
 
drakula2k profile image
Vlad • Edited

Sure, here it is gist.github.com/Drakula2k/035cc5bd...
I also fixed a couple of bugs there

Thread Thread
 
karisjochen profile image
karisjochen

Thanks so much for sharing! After making the changes I am unfortunately still getting blocked by the robots.txt file. Is this code still working for you?

Collapse
 
abbas53333 profile image
abbas53333 • Edited

It works like A Charm. Thank you sooooooo much. but i have 2 questions:
1) How do we include the User name to identify the posts to which username.
2) How can we get the Basic information suck as Name Bio Handle Number of followers Number of following and Media Count ?

if this works for all those information i might need to subscrive to Scrap Api 1+Million xD

Thank you in Advance

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
abbas53333 profile image
abbas53333

Hey there, you have to download python and install something called Scrapy its an application for Python i would recommend to look some Videos on youtube to learn and i suggest to start by following this tutorial 25 episodes
youtube.com/watch?v=ve_0h4Y8nuI&li... this channel is very good follow it and you shall start!
Have a good day

Collapse
 
mayankbali profile image
Mayank Bali

Hey this code is giving me Error

Ignoring response <403 https://api.webscraping.ai/html?api_key=45299f85b2302dd84a9f53e5a799114e&proxy=residential&timeout=20000&url=https%3A%2F%2Fwww.instagram.com%2Fnike%2F%3Fhl%3Den>: HTTP status code is not handled or not allowed

Can Anyone help me out here?

Collapse
 
iankerins profile image
Ian Kerins

The code in the article is designed to use scraperapi.com as the proxy, you are using webscraping.ai. You need to adapt the code to use this proxy as the error suggests that they use a different authentication method for their API.

Collapse
 
ghostgardens profile image
GhostGardens

Hi there! Great post...it answers a lot of questions.

Small thing, though: the "likes" count & comment count isn't working properly. I'm assuming it's due to the near-constant moving target of Instagram changing their page. Any hints on how to resolve this?

Thanks very much for your time!

Collapse
 
jacksonbull87 profile image
jacksonbull87

the likes count isn't working for me either. its just giving me NaN values. Any idea on how to fix this?

Collapse
 
vasana12 profile image
vasana12

Hi. This is a very helpful article.
What does the variable "first" in the dictionary mean? I am making a hashtag-based crawler. There is a problem setting the value of the "first" variable. Can you answer the criteria for setting?

Collapse
 
amber_alina_5740e3da85a33 profile image
Amber Alina

Hi,

I’m from the "Rteetech Marketing Agency". I am SEO Expert "GUEST POST" Provider And "Content Writer" with existing post links with our high-quality DA, PA, and TRAFFIC websites. I have a large number of quality websites according to yours. requirement, Our service boosts your website on Google’s page and you get good traffic. I can increase traffic to your website, I can rank your website on the first page of Google, and can do all the work in SEO.
want.
1=Confirm Do-Follow links
2=Permanent post
3=100% Google index
4=No sponsored tags
5=Cheap prices
You are sure to find a website that's best for you, as this list has different categories available, such as:
Technology/Finance
Tech
Health/Beauty
Food
Travel
Spanish
Italian
Finch
Sports
Cars/Pets
Cryptocurrency/Blockchain/Bitcoin
Business/Marketing
Education
Real Estate
Personal Blogs
Astrology/Spirituality
Love Relationship/Yoga
and much more!
Note: Our sites are not PBN Sites So, I am really interested to work with you on your business promotion projects.
Let me know if you're interested. Should I send you the list?

Please let me know to proceed further. Looking forward to hearing from you soon.
Thanks & Regards,

Collapse
 
karisjochen profile image
karisjochen

It appears I am getting stopped by Instagram's robots.txt file. Any ideas on how to adjust the code to circumvent this?

Collapse
 
thedukeofnada profile image
thedukeofnada

Saved my life with this script. Is there a way to extract the actual user comments and not just the count? >username /text/date/time