DEV Community

kaifcodec
kaifcodec

Posted on

Created InstaScrape an Async Instagram Comment Scraper, looking for feedbacks and contributions.

๐Ÿš€ InstaScrape โ†’ Async Instagram Comment Scraper

Visit: Github


Scrape all parent comments from any Instagram Reel with automated login, async speed, real-time progress, and clean exports โ€” no manual cookie copying required.


โœจ Features

  • โœ… Automated Login: cookie.json persistence with iat + expiry, no manual cookies needed.
  • ๐Ÿ”„ Self-healing Auth: detects expired cookies mid-run, prompts relogin, resumes automatically.
  • โšก Async Engine: powered by httpx.AsyncClient with requests-per-second throttling.
  • ๐Ÿ“Š Progress Tracking: accurate percent and ETA from Instagramโ€™s comment count.
  • ๐Ÿ“ Dual Exports: TXT and JSON files saved in timestamped folders.

๐Ÿ“ฆ Requirements

  • Python 3.9+
  • Dependencies:
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

๐Ÿ› ๏ธ Installation

git clone https://github.com/kaifcodec/InstaScrape
cd InstaScrape
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

โ–ถ๏ธ Usage

python3 main.py
Enter fullscreen mode Exit fullscreen mode
  • Enter the Instagram Reel URL (e.g., https://www.instagram.com/reel/SHORTCODE/).
  • Set Max requests per second (5-7 recommended). Adjust for stability.
  • On first run, provide username/password; cookie.json is created and reused until expiry.

๐Ÿ“ Output

  • TXT: download_comments/txt/reel_comments_YYYYMMDD_HHMMSS.txt
  • JSON: download_comments/json/reel_comments_YYYYMMDD_HHMMSS.json Example JSON structure:
{
  "generated_at": 1700000000,
  "count": 123,
  "comments": [
    { "username": "user1", "text": "Nice!", "created_at": 1699999000 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”ง How it Works

  • Cookie Lifecycle: cookie.json stores iat and expiry; validated on startup & during requests.
  • Error Resilience: retries transient errors and refreshes cookies on 401/redirect-to-login.
  • Progress Accuracy: uses Instagramโ€™s comment count to calculate percent & ETA.
  • Async Efficiency: httpx.AsyncClient with HTTP/2, keep-alive, and RPS limiter.

๐Ÿ’ก Tips

  • Start with 5-7 RPS to minimize throttling; increase gradually.
  • Filenames use local time; switch to UTC by replacing datetime.now() with datetime.utcnow() in main.py.

โš ๏ธ Disclaimer

Use responsibly. Comply with Instagramโ€™s Terms of Service. Intended for personal or permitted use only.

Top comments (1)

Collapse
 
kaifcodec profile image
kaifcodec

It's bit slower right now, cause it uses /graphql API endpoint of Instagram that loads comments in new pages dynamically, Feel free to suggest fixes and improvements.