❓ What the heck is instascrape?
instascrape is a lightweight library designed for scraping data from Instagram using Python! It makes no assumptions about your project and is instead designed for flexibility and productivity so you can get on your way and start exploring Instagram data easily and efficiently.
Here is a quick glimpse into a scrape that was accomplished using selenium and instascrape to gather how many likes per post a user got per post in 2020.
💾 How do I get it?
You can install from PyPI with ye old
$ pip3 install insta-scrape
or clone from the official repo with
$ git clone https://github.com/chris-greening/instascrape.git
The dependencies are light, mostly leveraging Requests for requesting the data and Beautiful Soup for parsing it.
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
Note: This module is no longer actively maintained.
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Key features
…💻 Quickstart
Let's start by scraping some data from a totally random Instagram page that is definitely not mine 😉
from instascrape import Profile
profile = Profile('chris_greening')
profile.scrape()
And that's it! In those 3 lines, we scraped 52 data points related to @chris_greening's page. We got how many followers, how many posts, whether they have a business profile, whether they're verified, etc.
Aside from Profile
, we also have the Post
and Hashtag
objects which work with almost the exact same syntax! With methods such
-
to_dict
to_json
to_csv
instascrape integrates nicely with tools like pandas and matplotlib so you can scrape, explore, and analyze your data with just a few lines of code. Integration with selenium is encouraged so you can get a powerful Instagram scraper going in no time!
We've only just scraped the surface so dig into the docs 📘 or even better, check out the source and contribute! Being such a young library (started Hacktoberfest 2020), the sky is the limit and it's only going to get more powerful from here 🙌
If you like my content be sure to check out some of my other blog posts or reach out to me on my website
I built an interactive 3D photo display with JavaScript
Chris Greening ・ Jun 27 '21
Cheers!
Top comments (16)
When I enter the Profile name, the code throws this warning:
MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.
MissingCookiesWarning
Could you please help with this?
Hello! Instagram has started requiring a valid sessionid cookie when making HTTP requests. Check out this blog post for more information on getting your sessionid for scraping
Technically it is just a warning and you can usually get away with a couple scrapes before they have a problem but I've found after about a dozen scrapes they start redirecting to their login page
Thank you!
Hi Chis, how are you?
I am trying to run your tutorials but I have an error could you help me?
i.ibb.co/52DX7hN/Screenshot-from-2...
Hey Vagner! That's an error I'm working on literally as we speak. I think Instagram tightened their restrictions overnight or something, I woke up to this error as well. I think what's happening is every GET request is being returned a 429 HTTP status code. I'm adding more robust headers to the
requests.get
calls thatinstascrape
makes and it seems to have fixed the problem so check the officical repository later today for the 1.4.0 release and then reinstall from PyPI. Thanks for the patience!Okay, 1.4.0 is live, go try that out and let me know if that works for you!
Hi, I am getting the following error. Could someone help me out with this?
Traceback (most recent call last):
File "C:\Users\booshnam.d.spyder-py3\instascrape.py", line 3, in
chris = Profile('chris_greening')
TypeError: init() missing 1 required positional argument: 'data'
Getting same error.
The issue is still open - github.com/chris-greening/instascr...
Hello Chris,
I used this awesome library but my code show some error, please help me out!!
from instascrape import Profile
profile = Profile('chris_greening')
profile.scrape()
C:\Python\Python39\python.exe "C:/Users/hp/PycharmProjects/instagram/insta reels.py"
Traceback (most recent call last):
File "C:\Users\hp\PycharmProjects\instagram\insta reels.py", line 1, in
from instascrape import Profile
File "C:\Python\Python39\lib\site-packages\instascrape_init_.py", line 4, in
from helpers import extract_email
ModuleNotFoundError: No module named 'helpers'
Process finished with exit code 1
Hello Chris,
I used this awesome library to scrap comments from posts with specific hashtags, but the issue i have is, i can't get all the comments of the post, also, it only gives me 12 posts!
how can i make retrieve all comments and also all posts?
Thanks
I am not even able to run those 3 lines. Getting this error
ImportError: cannot import name 'Profile' from 'instascrape' (/home/rajanverma/workspace/indie_hacks/offline/instascrape.py)
Hello Aarvy, I saw your issue on GitHub, hope you found a solution!
yes. shall I remove the comment?
error helpers
I am not even able to run those 3 lines. See?
Unable to run even these 3 lines :(
Getting this error:
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
Pleaase help !