DEV Community

Cover image for The Instagram Profile scraper
Chris Greening
Chris Greening

Posted on • Updated on

The Instagram Profile scraper

In this blog post, we're going to take a quick peak at what data points are scraped from an Instagram profile page when using the Profile scraper provided by instascrape.

GitHub logo chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.

Version Downloads Release License

Activity Dependencies Issues

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features

Here are a few of the things that…

The Profile scraper scrapes 51 data points associated with an Instagram profile page.

Instance attribute names have been chosen to be semantic and easy to understand.

The data points

The best way to learn is by example so we'll take a look at @google's scraped Instagram data.

All instascrape scrapers have a to_dict method that returns all data as a dictionary so we can see everything in one shot.

from instascrape import Profile 
google = Profile("google")
google.scrape()
google.to_dict()
>>>
{'csrf_token': 'HNSDJKGNFDJKGFDKJGNDFKSJ239048329084UJSKLDF',
 'viewer': None,
 'viewer_id': None,
 'country_code': 'US',
 'language_code': 'en',
 'locale': 'en_US',
 'device_id': '12345678-1234-1234-1234-123456789012',
 'browser_push_pub_key': 'BIBn3E_rWTci8Xn6P9Xj3btShT85Wdtne0LtwNUyRQ5XjFNkuTq9j4MPAVLvAFhXrUU1A9UxyxBA7YIOjqDIDHI',
 'key_id': '139',
 'public_key': 'a7db85ba1f0c3bdc5be6aeff1faadcbb8082bfb9f757990b90afd0e9f9619e7f',
 'version': '10',
 'is_dev': False,
 'rollout_hash': 'b10813bd9030',
 'bundle_variant': 'es6',
 'frontend_dev': 'prod',
 'logging_page_id': 'profilePage_1067259270',
 'show_suggested_profiles': False,
 'show_follow_dialog': False,
 'biography': 'Google unfiltered—sometimes with filters.',
 'blocked_by_viewer': False,
 'restricted_by_viewer': None,
 'country_block': False,
 'external_url': 'https://linkin.bio/google',
 'external_url_linkshimmed': 'https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=ATPiguryyJW2meNAk2LxG0-KfnYmPPQE4rdXSycwxdOiF9E_PjTnd56L4QqvftSldBYslIw1BcHJIhlF&s=1',
 'followers': 12364011,
 'followed_by_viewer': False,
 'following': 31,
 'follows_viewer': False,
 'full_name': 'Google',
 'has_ar_effects': False,
 'has_clips': True,
 'has_guides': True,
 'has_channel': False,
 'has_blocked_viewer': False,
 'highlight_reel_count': 6,
 'has_requested_viewer': False,
 'id': '1067259270',
 'is_business_account': True,
 'is_joined_recently': False,
 'business_category_name': 'Business & Utility Services',
 'overall_category_name': None,
 'category_enum': 'INTERNET_COMPANY',
 'is_private': False,
 'is_verified': True,
 'mutual_followers': 0,
 'profile_pic_url': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/s150x150/126151620_3420222801423283_6498777152086077438_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_ohc=lXdEi27jxecAX9hUsVW&tp=1&oh=da5dc7c6bb5f223255450522aa3ea3cf&oe=600FEC68',
 'profile_pic_url_hd': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/s320x320/126151620_3420222801423283_6498777152086077438_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_ohc=lXdEi27jxecAX9hUsVW&tp=1&oh=21880564a3688c7650948b63aca5c895&oe=6011C871',
 'requested_by_viewer': False,
 'username': 'google',
 'connected_fb_page': None,
 'posts': 1457}
Enter fullscreen mode Exit fullscreen mode

And there we have it! If you're interested in seeing instascrape in action, check out some of my other posts that explore practical examples:

Top comments (1)

Collapse
 
farahseddik profile image
farahsedd

Hi Chris,

Thank you sm for sharing. I tried to use your code but I am always getting nan values in all attributes.
code:
"from instascrape import Profile
google = Profile("instagram.com/google/")
google.scrape()
google.to_dict()"
or
"from instascrape import Profile
google = Profile("google")
google.scrape()
google.to_dict()"

Do you know what could be the cause? thanks in advance