DEV Community

Cover image for Scrape data from Instagram with instascrape and Python
Chris Greening
Chris Greening

Posted on • Updated on

Scrape data from Instagram with instascrape and Python

❓ What the heck is instascrape?

instascrape is a lightweight library designed for scraping data from Instagram using Python! It makes no assumptions about your project and is instead designed for flexibility and productivity so you can get on your way and start exploring Instagram data easily and efficiently.

Here is a quick glimpse into a scrape that was accomplished using selenium and instascrape to gather how many likes per post a user got per post in 2020.
Alt Text

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

💾 How do I get it?

You can install from PyPI with ye old

$ pip3 install insta-scrape 
Enter fullscreen mode Exit fullscreen mode

or clone from the official repo with

$ git clone https://github.com/chris-greening/instascrape.git
Enter fullscreen mode Exit fullscreen mode

The dependencies are light, mostly leveraging Requests for requesting the data and Beautiful Soup for parsing it.

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

Version Downloads Release License

Activity Dependencies Issues

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features


💻 Quickstart

Let's start by scraping some data from a totally random Instagram page that is definitely not mine 😉

from instascrape import Profile 
profile = Profile('chris_greening')
profile.scrape()
Enter fullscreen mode Exit fullscreen mode

And that's it! In those 3 lines, we scraped 52 data points related to @chris_greening's page. We got how many followers, how many posts, whether they have a business profile, whether they're verified, etc.

Aside from Profile, we also have the Post and Hashtag objects which work with almost the exact same syntax! With methods such

  • to_dict
  • to_json
  • to_csv

instascrape integrates nicely with tools like pandas and matplotlib so you can scrape, explore, and analyze your data with just a few lines of code. Integration with selenium is encouraged so you can get a powerful Instagram scraper going in no time!

We've only just scraped the surface so dig into the docs 📘 or even better, check out the source and contribute! Being such a young library (started Hacktoberfest 2020), the sky is the limit and it's only going to get more powerful from here 🙌

If you like my content be sure to check out some of my other blog posts or reach out to me on my website

Cheers!

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Top comments (16)

Collapse
 
athiyarastogi profile image
Athiya Rastogi

When I enter the Profile name, the code throws this warning:

MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.
MissingCookiesWarning

Could you please help with this?

Collapse
 
chrisgreening profile image
Chris Greening

Hello! Instagram has started requiring a valid sessionid cookie when making HTTP requests. Check out this blog post for more information on getting your sessionid for scraping

Technically it is just a warning and you can usually get away with a couple scrapes before they have a problem but I've found after about a dozen scrapes they start redirecting to their login page

Collapse
 
athiyarastogi profile image
Athiya Rastogi

Thank you!

Collapse
 
vagnerbelfort profile image
Vagner Belfort • Edited

Hi Chis, how are you?

I am trying to run your tutorials but I have an error could you help me?

i.ibb.co/52DX7hN/Screenshot-from-2...

Collapse
 
chrisgreening profile image
Chris Greening

Hey Vagner! That's an error I'm working on literally as we speak. I think Instagram tightened their restrictions overnight or something, I woke up to this error as well. I think what's happening is every GET request is being returned a 429 HTTP status code. I'm adding more robust headers to the requests.get calls that instascrape makes and it seems to have fixed the problem so check the officical repository later today for the 1.4.0 release and then reinstall from PyPI. Thanks for the patience!

Collapse
 
chrisgreening profile image
Chris Greening

Okay, 1.4.0 is live, go try that out and let me know if that works for you!

Collapse
 
booshnam profile image
Booshnam

Hi, I am getting the following error. Could someone help me out with this?

Traceback (most recent call last):

File "C:\Users\booshnam.d.spyder-py3\instascrape.py", line 3, in
chris = Profile('chris_greening')

TypeError: init() missing 1 required positional argument: 'data'

Collapse
 
joemol94 profile image
Joemol94

Getting same error.
The issue is still open - github.com/chris-greening/instascr...

Collapse
 
govindsingh9447 profile image
GovindSingh9447

Hello Chris,
I used this awesome library but my code show some error, please help me out!!

from instascrape import Profile
profile = Profile('chris_greening')
profile.scrape()

C:\Python\Python39\python.exe "C:/Users/hp/PycharmProjects/instagram/insta reels.py"
Traceback (most recent call last):
File "C:\Users\hp\PycharmProjects\instagram\insta reels.py", line 1, in
from instascrape import Profile
File "C:\Python\Python39\lib\site-packages\instascrape_init_.py", line 4, in
from helpers import extract_email
ModuleNotFoundError: No module named 'helpers'

Process finished with exit code 1

Collapse
 
younes profile image
Younes

Hello Chris,
I used this awesome library to scrap comments from posts with specific hashtags, but the issue i have is, i can't get all the comments of the post, also, it only gives me 12 posts!
how can i make retrieve all comments and also all posts?
Thanks

Collapse
 
rajanverma_me profile image
Aarvy • Edited

I am not even able to run those 3 lines. Getting this error

ImportError: cannot import name 'Profile' from 'instascrape' (/home/rajanverma/workspace/indie_hacks/offline/instascrape.py)

Collapse
 
chrisgreening profile image
Chris Greening

Hello Aarvy, I saw your issue on GitHub, hope you found a solution!

Collapse
 
rajanverma_me profile image
Aarvy

yes. shall I remove the comment?

Collapse
 
geo3huruf profile image
geo3huruf

error helpers

Collapse
 
jnascimentocode profile image
Jean Nascimento

I am not even able to run those 3 lines. See?

Image description

Collapse
 
fatima2309 profile image
Fatima-2309

Unable to run even these 3 lines :(

Getting this error:
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Pleaase help !