DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» is a community of 964,423 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Create account Log in
Cover image for Scraping an Instagram location tag with instascrape
Chris Greening
Chris Greening

Posted on

Scraping an Instagram location tag with instascrape

Introducting: instascrape.Location

In this blog post, I am going to quickly introduce to you instascrape's newest feature: the ability to scrape an Instagram location tag!

With the release of v1.4.0, the Location scraper now provides a semantic way to gather data from an Instagram location tag.

Sample usage

from instascrape import Location 
url = 'https://www.instagram.com/explore/locations/212988663/new-york-new-york/'
new_york = Location(url)
new_york.scrape()
Enter fullscreen mode Exit fullscreen mode

It's as easy as that! We've scraped the page and now have access to useful information such as

print(f"The NY location tag has {new_york.amount_of_posts:,} posts")
>>> The NY location tag has 61,202,403 posts.

print(f"NY tag geographic coordinates: ({new_york.latitude}, {new_york.longitude}")
>>> NY tag geographic coordinates: (40.7142, -74.0064)
Enter fullscreen mode Exit fullscreen mode

as well as a variety of other useful attributes!

get_recent_posts

In addition to scraping some attributes regarding the location tag, we can also return some of the recent posts to that tag as instascrape.Post objects

recent_posts = new_york.get_recent_posts()
for post in recent_posts:
    print(post.upload_date)
>>> 2020-12-10 20:27:03
2020-12-10 20:27:01
2020-12-10 20:26:59
2020-12-10 20:26:59
2020-12-10 20:26:51
2020-12-10 20:26:48
2020-12-10 20:26:46
2020-12-10 20:26:45
2020-12-10 20:26:42
2020-12-10 20:26:40
2020-12-10 20:26:39
2020-12-10 20:26:33
2020-12-10 20:26:32
2020-12-10 20:26:31
2020-12-10 20:26:25
2020-12-10 20:26:22
2020-12-10 20:26:20
2020-12-10 20:26:18
2020-12-10 20:26:15
2020-12-10 20:26:11
2020-12-10 20:26:10
2020-12-10 20:26:09
2020-12-10 20:26:09
2020-12-10 20:26:06
Enter fullscreen mode Exit fullscreen mode

If you want to read more about instascrape, check out some of my other posts

Or better yet, get involved and contribute! Drop the official repo a star, get involved in discussions, and stay in the loop by watching for updates on the website! Hope to see you there πŸ™Œ

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

Version Downloads Release License

Activity Dependencies Issues

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that…

Top comments (8)

Collapse
 
imthedeveloper profile image
ImTheDeveloper

Ive been following your posts closely in regards to instascrape, a great little library! Just wondered if you have hit any issues yet with instagram updating their layout / adding in any new blocking mechanisms to your scraper since working on it so far? I'm kind of thinking along the lines of how long is the shelf life for a scraper before some new breaking change comes along (ideally they want to push users into their API)

Collapse
 
chrisgreening profile image
Chris Greening Author • Edited on

Funny that you mention it, today was the first day after about two months of the lib's existence that I had to fix something because of a change on Instagram's end lol, I kept getting hit with 429 status codes on every request I made. I kind of figured something like this was going to happen eventually because I wasn't passing any header info with the requests; I quickly added support for passing default/custom header info though and now it's back up and running like a charm

One of the driving factors in design choice since day one has been to account for a changing Instagram API as well as the tightening of restrictions that Instagram has been trending towards. I'm hoping I'll be able to roll with the punches as they come and continue to float under their radar lol. I deliberately excluded selenium and any sort of interaction with Instagram content to avoid their wrath as much as possible so we'll see how it goes πŸ˜…

Thanks for following and asking! With the library in a comfortably stable place and no major internal design changes in the near future, I'm ready to go back and fix up some of the stuff I was kind of neglecting (i.e. missing headers, fine tuning with arguments, etc.)

Collapse
 
imthedeveloper profile image
ImTheDeveloper

Nice to see it wasn't anything catastrophic πŸ‘ I'll be likely giving this a go to monitor some insta accounts for new posts and publishing them via my telegram bot into a chat. I've been considering running through lumintai.io as I do with twitter and YouTube which has served well to proxy from multiple locations so the traffic on a single IP doesn't stack up I'll get a tutorial up on Dev if it works out πŸ‘

Thread Thread
 
chrisgreening profile image
Chris Greening Author

Awesome! Would love to see that tutorial, I'll keep an eye out. I wrote a script a couple months ago that rotates free anonymous proxies but kept getting hit with 403's, probably because the IP's are blacklisted since everyone else is using them lol. Haven't done too much more research into proxies since I haven't really needed it yet but I plan future versions of instascrape to have support for it; it's definitely a vital tool for any large scale scraping

Collapse
 
villival profile image
villival

hey thanks for the posts i would like to know what's all about helpers module ..

i was trying to run a script ..system throwed a error
Helper module not found

Collapse
 
chrisgreening profile image
Chris Greening Author • Edited on

Hey there, looks like you pip installed instascrape instead of insta-scrape from PyPI. My package has the hyphen; check out the installation section of the repo or the official PyPI page for more details

Thanks so much for your interest in the library! Let me know if you have any more questions

Collapse
 
villival profile image
villival

thanks for the valuable reply :)

Collapse
 
bilall003 profile image
Bilal

Great work! it helped me a lot in a project I'm doing the only problem is when i return the upload date it does not upload in right format it returns a list of integers

🌚 Browsing with dark mode makes you a better developer by a factor of exactly 40.

It's a scientific fact.