Every automation engineer has hit this wall. Your headless browser can scrape 10,000 pages, but it can't solve a CAPTCHA.
You build the perfect scraper. It handles pagination, retries, rate limiting — everything. Then you hit a login page with a CAPTCHA, and your entire pipeline falls apart.
I got tired of this, so I built SessionKeeper.
The Problem Nobody Talks About
Modern websites have layered defenses:
- CAPTCHAs that block automated logins
- Bot detection (Cloudflare, DataDome, PerimeterX) that fingerprints headless browsers
- Session expiry that forces re-authentication every few hours
- MFA flows that require human interaction
The usual workarounds all have drawbacks:
| Approach | Problem |
|---|---|
| CAPTCHA solving services | $2-3 per 1,000 solves, unreliable, ethically questionable |
| Stealing cookies from your real browser | Breaks when cookies expire, fragile |
| Keeping a browser open 24/7 | Resource hog, sessions still expire |
| Rotating proxies + new accounts | Expensive, against most ToS |
What if you could just log in once, by hand, and then automate everything until the session actually expires?
Enter SessionKeeper
SessionKeeper is a Python tool that manages browser sessions for automation. The core idea is simple:
- Detect when a session is expired
- Open a visible browser so a human can log in (solve CAPTCHAs, do MFA, whatever)
- Save the authenticated session
- Return to headless automation using the saved session
- Only bother you again when the session actually expires
You solve the CAPTCHA once. SessionKeeper handles the rest.
Quick Start
pip install playwright && playwright install firefox
Use it in your automation:
from sessionkeeper import SessionKeeper
async with SessionKeeper("reddit") as sk:
page = await sk.get_authenticated_page("https://reddit.com")
# You're logged in. Do your automation.
await page.goto("https://reddit.com/r/blender/submit")
The first time you run this, a browser window pops up. You log into Reddit normally — solve the CAPTCHA, enter your credentials, do whatever the site asks. Once you're in, SessionKeeper saves the session and closes the visible browser.
Every subsequent run uses the saved session. No browser window. No CAPTCHA. Pure headless automation.
When the session eventually expires, SessionKeeper detects it and opens the browser again. One login, and you're good for another session cycle.
CLI Usage
Pre-authenticate from the command line, then use sessions in your scripts:
# Authenticate with a site
python sessionkeeper.py auth reddit
# Check if a session is still valid
python sessionkeeper.py check reddit
# List all saved sessions
python sessionkeeper.py status
# Clear an expired session
python sessionkeeper.py clear reddit
Built-in Site Configs
SessionKeeper ships with configurations for 5 sites out of the box:
- Reddit — detects login state via user menu elements
- Gumroad — handles reCAPTCHA on login
- DEV.to — dashboard detection
- Twitter/X — multi-step login flow
- note.com — Japanese blogging platform
Each config defines the login URL, a check URL to verify auth, CSS selectors for success/failure states.
Custom Site Configuration
Need to automate a site that isn't built in? Pass a config dict:
config = {
"login_url": "https://mysite.com/login",
"check_url": "https://mysite.com/dashboard",
"success_indicator": ".user-avatar, a[href*='settings']",
"failure_indicator": "input[type='password']",
"display_name": "My Site",
}
async with SessionKeeper("mysite", config=config) as sk:
page = await sk.get_authenticated_page("https://mysite.com/dashboard")
The success_indicator and failure_indicator are CSS selectors that SessionKeeper checks after navigating to check_url. If the success selector matches, the session is valid. If the failure selector matches (or success doesn't), it's time to re-authenticate.
How It Works Under the Hood
SessionKeeper is built on Playwright and uses its storage_state persistence:
1. Check for saved session file (~/.sessionkeeper/reddit_session.json)
2. If exists → load into headless browser → navigate to check_url → verify auth
3. If valid → return authenticated page (headless)
4. If expired/missing → launch VISIBLE browser → navigate to login_url
5. Wait for human to complete login + CAPTCHA
6. On success → save storage_state → close visible browser → return headless page
The saved state includes all cookies (including httpOnly), localStorage, and sessionStorage. Because Playwright manages a real Firefox instance, sites see a normal browser.
Why Not Just Use CAPTCHA Solving Services?
Cost adds up fast. At $2-3 per 1,000 solves, running daily automation across multiple sites costs $50-100/month. SessionKeeper costs you 30 seconds of manual login per session cycle (sessions typically last hours to days).
Reliability is inconsistent. CAPTCHA services have solve rates of 85-95%. SessionKeeper's solve rate is 100% because a human is doing it.
New CAPTCHA types break services. Every time Google updates reCAPTCHA or Cloudflare changes Turnstile, solving services lag behind. A human doesn't have this problem.
Real-World Use Cases
- Social media automation — posting to Reddit, Twitter without re-authenticating every run
- E-commerce monitoring — price tracking on sites that require login
- Content management — automated publishing to platforms with CAPTCHA walls
- Internal tools — logging into dashboards for automated reporting
Get Started
SessionKeeper is open source (MIT) and available now:
GitHub: github.com/vesper-astrena/sessionkeeper
Single Python file, zero dependencies beyond Playwright. Drop it into your project and never fight a CAPTCHA twice.
Star the repo if this solves a problem you've had. Issues and PRs are welcome.
What automation task has CAPTCHAs been blocking you from? Drop a comment — I'd love to hear your use case.
Top comments (0)