🚀 InstaScrape → Async Instagram Comment Scraper
Visit: Github
Scrape all parent comments from any Instagram Reel with automated login, async speed, real-time progress, and clean exports — no manual cookie copying required.
✨ Features
- ✅ Automated Login:
cookie.jsonpersistence with iat + expiry, no manual cookies needed. - 🔄 Self-healing Auth: detects expired cookies mid-run, prompts relogin, resumes automatically.
- ⚡ Async Engine: powered by
httpx.AsyncClientwith requests-per-second throttling. - 📊 Progress Tracking: accurate percent and ETA from Instagram’s comment count.
- 📁 Dual Exports: TXT and JSON files saved in timestamped folders.
📦 Requirements
- Python 3.9+
- Dependencies:
pip install -r requirements.txt
🛠️ Installation
git clone https://github.com/kaifcodec/InstaScrape
cd InstaScrape
pip install -r requirements.txt
▶️ Usage
python3 main.py
- Enter the Instagram Reel URL (e.g., https://www.instagram.com/reel/SHORTCODE/).
- Set Max requests per second (5-7 recommended). Adjust for stability.
- On first run, provide username/password; cookie.json is created and reused until expiry.
📁 Output
- TXT: download_comments/txt/reel_comments_YYYYMMDD_HHMMSS.txt
- JSON: download_comments/json/reel_comments_YYYYMMDD_HHMMSS.json Example JSON structure:
{
"generated_at": 1700000000,
"count": 123,
"comments": [
{ "username": "user1", "text": "Nice!", "created_at": 1699999000 }
]
}
🔧 How it Works
- Cookie Lifecycle: cookie.json stores iat and expiry; validated on startup & during requests.
- Error Resilience: retries transient errors and refreshes cookies on 401/redirect-to-login.
- Progress Accuracy: uses Instagram’s comment count to calculate percent & ETA.
- Async Efficiency: httpx.AsyncClient with HTTP/2, keep-alive, and RPS limiter.
💡 Tips
- Start with 5-7 RPS to minimize throttling; increase gradually.
- Filenames use local time; switch to UTC by replacing datetime.now() with datetime.utcnow() in main.py.
⚠️ Disclaimer
Use responsibly. Comply with Instagram’s Terms of Service. Intended for personal or permitted use only.
Top comments (1)
It's bit slower right now, cause it uses
/graphqlAPI endpoint of Instagram that loads comments in new pages dynamically, Feel free to suggest fixes and improvements.