After e-commerce monitoring, building social media scrapers to monitor accounts and track new trends is the next most popular use case for web scra...
For further actions, you may consider blocking this person and/or reporting abuse
Looks like Instagram doesn't work via Scraper API anymore. But it still works on webscraping.ai
instagram.com/explore/tags//
Do you know how to get the posts for a tagged username. Simply replacing the former URL with this new one doesn't seem to work.
instagram.com/explore/tags/sport/?... works for hashtags, and instagram.com/nike/tagged/?__a=1 works for username, but this one requries login
Do you mind sharing how you adjusted the code to use webscraping.ai instead? Thanks!
Sure, here it is gist.github.com/Drakula2k/035cc5bd...
I also fixed a couple of bugs there
Thanks so much for sharing! After making the changes I am unfortunately still getting blocked by the robots.txt file. Is this code still working for you?
Yes, it's working. You can disable the robots.txt check by setting ROBOTSTXT_OBEY = False on your settings.py. It works via an API so there is no need for the robots.txt check.
incredible, thank you! It worked! So is it always a good idea to set the ROBOTSTXT_OBEY = False considering we dont want to be stopped?
Yes, ROBOTSTXT_OBEY is good when you're building something like a search engine and it may request all sorts of random URLs posted on the Internet. In that case, using robots.txt is good to skip non-public pages.
But if you're requesting particularly defined URLs or using an API, robots.txt is not so useful and may block access to the API.
thanks a lot, I learned a ton from your code... but im still get confused by the query_hash. may I ask how do you get this constant for this tpye of query,pls?
Open Inspector in Chrome, visit Instagram and scroll through the posts, you'll see the same GraphQL queries with query_hash.
I'm not sure what query_hash value means exactly, but they're static for each type of query it seems.
Ohhh, I see, it's a constant number(every time drop-down the perfil), but for me it's a diferent number, not 'e769aa130647d2354c40ea6a439bfc08', by the way, thank you so much, I am beginner on Scrapy, and do you sugguest any book or tutorial to learn advanced project based on Scrapy, I already bought this book .
Kai
Merry Chrismas
Regards
They may have changed something, but the old value still works too, it seems.
I'm not a specialist in Scrapy, but generally, I'd read official docs (docs.scrapy.org/en/latest/) and then start doing some projects using it and learn from them.
It works like A Charm. Thank you sooooooo much. but i have 2 questions:
1) How do we include the User name to identify the posts to which username.
2) How can we get the Basic information suck as Name Bio Handle Number of followers Number of following and Media Count ?
if this works for all those information i might need to subscrive to Scrap Api 1+Million xD
Thank you in Advance
Hey there, you have to download python and install something called Scrapy its an application for Python i would recommend to look some Videos on youtube to learn and i suggest to start by following this tutorial 25 episodes
youtube.com/watch?v=ve_0h4Y8nuI&li... this channel is very good follow it and you shall start!
Have a good day
Hi there! Great post...it answers a lot of questions.
Small thing, though: the "likes" count & comment count isn't working properly. I'm assuming it's due to the near-constant moving target of Instagram changing their page. Any hints on how to resolve this?
Thanks very much for your time!
the likes count isn't working for me either. its just giving me NaN values. Any idea on how to fix this?
Hey this code is giving me Error
Ignoring response <403 https://api.webscraping.ai/html?api_key=45299f85b2302dd84a9f53e5a799114e&proxy=residential&timeout=20000&url=https%3A%2F%2Fwww.instagram.com%2Fnike%2F%3Fhl%3Den>: HTTP status code is not handled or not allowed
Can Anyone help me out here?
The code in the article is designed to use scraperapi.com as the proxy, you are using webscraping.ai. You need to adapt the code to use this proxy as the error suggests that they use a different authentication method for their API.
Hi. This is a very helpful article.
What does the variable "first" in the dictionary mean? I am making a hashtag-based crawler. There is a problem setting the value of the "first" variable. Can you answer the criteria for setting?
It appears I am getting stopped by Instagram's robots.txt file. Any ideas on how to adjust the code to circumvent this?
Saved my life with this script. Is there a way to extract the actual user comments and not just the count? >username /text/date/time