DEV Community

loading...
Cover image for Scrape data from Instagram with instascrape and Python

Scrape data from Instagram with instascrape and Python

Chris Greening
Freelance Python developer | Probably programming right now | Coding, hiking, and rollerblading
Updated on ・2 min read

❓ What the heck is instascrape?

instascrape is a lightweight library designed for scraping data from Instagram using Python! It makes no assumptions about your project and is instead designed for flexibility and productivity so you can get on your way and start exploring Instagram data easily and efficiently.

Here is a quick glimpse into a scrape that was accomplished using selenium and instascrape to gather how many likes per post a user got per post in 2020.
Alt Text


💾 How do I get it?

You can install from PyPI with ye old

$ pip3 install insta-scrape 
Enter fullscreen mode Exit fullscreen mode

or clone from the official repo with

$ git clone https://github.com/chris-greening/instascrape.git
Enter fullscreen mode Exit fullscreen mode

The dependencies are light, mostly leveraging Requests for requesting the data and Beautiful Soup for parsing it.

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Version Downloads Release License

Activity Dependencies Issues Code style: black

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that instascrape does well:

  • Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
  • Scrapes HTML, BeautifulSoup, and JSON
  • Download content to your computer as png, jpg, mp4, and mp3
  • Dynamically retrieve HTML embed code for posts
  • Expressive and consistent API for concise and elegant code
  • Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
  • Lightweight; no boilerplate or configurations necessary
  • The only hard dependencies are Requests and…

💻 Quickstart

Let's start by scraping some data from a totally random Instagram page that is definitely not mine 😉

from instascrape import Profile 
profile = Profile('chris_greening')
profile.scrape()
Enter fullscreen mode Exit fullscreen mode

And that's it! In those 3 lines, we scraped 52 data points related to @chris_greening's page. We got how many followers, how many posts, whether they have a business profile, whether they're verified, etc.

Aside from Profile, we also have the Post and Hashtag objects which work with almost the exact same syntax! With methods such

  • to_dict
  • to_json
  • to_csv

instascrape integrates nicely with tools like pandas and matplotlib so you can scrape, explore, and analyze your data with just a few lines of code. Integration with selenium is encouraged so you can get a powerful Instagram scraper going in no time!

We've only just scraped the surface so dig into the docs 📘 or even better, check out the source and contribute! Being such a young library (started Hacktoberfest 2020), the sky is the limit and it's only going to get more powerful from here 🙌

Discussion (13)

Collapse
athiyarastogi profile image
Athiya Rastogi

When I enter the Profile name, the code throws this warning:

MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.
MissingCookiesWarning

Could you please help with this?

Collapse
chrisgreening profile image
Chris Greening Author

Hello! Instagram has started requiring a valid sessionid cookie when making HTTP requests. Check out this blog post for more information on getting your sessionid for scraping

Technically it is just a warning and you can usually get away with a couple scrapes before they have a problem but I've found after about a dozen scrapes they start redirecting to their login page

Collapse
athiyarastogi profile image
Athiya Rastogi

Thank you!

Collapse
vagnerbelfort profile image
Vagner Belfort • Edited

Hi Chis, how are you?

I am trying to run your tutorials but I have an error could you help me?

i.ibb.co/52DX7hN/Screenshot-from-2...

Collapse
chrisgreening profile image
Chris Greening Author

Hey Vagner! That's an error I'm working on literally as we speak. I think Instagram tightened their restrictions overnight or something, I woke up to this error as well. I think what's happening is every GET request is being returned a 429 HTTP status code. I'm adding more robust headers to the requests.get calls that instascrape makes and it seems to have fixed the problem so check the officical repository later today for the 1.4.0 release and then reinstall from PyPI. Thanks for the patience!

Collapse
chrisgreening profile image
Chris Greening Author

Okay, 1.4.0 is live, go try that out and let me know if that works for you!

Collapse
govindsingh9447 profile image
GovindSingh9447

Hello Chris,
I used this awesome library but my code show some error, please help me out!!

from instascrape import Profile
profile = Profile('chris_greening')
profile.scrape()

C:\Python\Python39\python.exe "C:/Users/hp/PycharmProjects/instagram/insta reels.py"
Traceback (most recent call last):
File "C:\Users\hp\PycharmProjects\instagram\insta reels.py", line 1, in
from instascrape import Profile
File "C:\Python\Python39\lib\site-packages\instascrape_init_.py", line 4, in
from helpers import extract_email
ModuleNotFoundError: No module named 'helpers'

Process finished with exit code 1

Collapse
booshnam profile image
Booshnam

Hi, I am getting the following error. Could someone help me out with this?

Traceback (most recent call last):

File "C:\Users\booshnam.d.spyder-py3\instascrape.py", line 3, in
chris = Profile('chris_greening')

TypeError: init() missing 1 required positional argument: 'data'

Collapse
younes profile image
Younes

Hello Chris,
I used this awesome library to scrap comments from posts with specific hashtags, but the issue i have is, i can't get all the comments of the post, also, it only gives me 12 posts!
how can i make retrieve all comments and also all posts?
Thanks

Collapse
aarvy profile image
Aarvy • Edited

I am not even able to run those 3 lines. Getting this error

ImportError: cannot import name 'Profile' from 'instascrape' (/home/rajanverma/workspace/indie_hacks/offline/instascrape.py)

Collapse
chrisgreening profile image
Chris Greening Author

Hello Aarvy, I saw your issue on GitHub, hope you found a solution!

Collapse
aarvy profile image
Aarvy

yes. shall I remove the comment?

Collapse
geo3huruf profile image
geo3huruf

error helpers