How to Schedule AI Agent Tasks with Cron (The Missing Guide)
AI agent frameworks love to show you how to build a single agent that handles a single task. What they don't show you is how to make that agent work autonomously — running on a schedule, coordinating between sessions, handling failures gracefully, and not stepping on its own work.
I run 23 cron jobs for an autonomous AI agent. Morning scans discover opportunities. Afternoon sessions research them. Evening sessions build application packets. Night sessions submit them. Each job has its own scope, its own tools, and its own failure modes.
Here's everything I learned about scheduling AI agents in production.
The Core Problem: Session Isolation
An AI agent invoked via cron doesn't have the context of previous sessions. It starts fresh. This creates three problems:
- No shared memory — The morning scan can't tell the afternoon session what it found (unless you use a database).
- No coordination — Two overlapping cron jobs might process the same item simultaneously.
- No context about failures — If a job fails at 3 AM, the next invocation doesn't know what happened.
The solution isn't to make sessions stateful. It's to make the state external — in a database that every session reads and writes.
# Every cron session starts the same way
import sqlite3
db = sqlite3.connect('/path/to/agent.db')
db.execute('PRAGMA journal_mode=WAL')
db.execute('PRAGMA busy_timeout=5000')
# What needs my attention right now?
items = db.execute('''
SELECT * FROM opportunities
WHERE stage = 'discovered'
AND fit_score >= 7.0
ORDER BY fit_score DESC
LIMIT 5
''').fetchall()
Each cron job has a single responsibility: pick up items in a specific stage, do its work, advance them to the next stage. The database is the coordination layer.
Lock Files Prevent Collisions
The simplest concurrency problem: your research cron runs at 2 PM and takes 45 minutes. Your packet-builder cron runs at 2:30 PM. Both try to write to the same database rows.
File-based locks solve this:
import os
import sys
LOCK_DIR = '/tmp/agent-locks'
os.makedirs(LOCK_DIR, exist_ok=True)
def acquire_lock(lock_type: str) -> bool:
"""Try to acquire a lock. Returns False if another job of the same type is running."""
lock_file = os.path.join(LOCK_DIR, f'{lock_type}.lock')
if os.path.exists(lock_file):
# Check if the PID in the lock file is still alive
try:
with open(lock_file) as f:
pid = int(f.read().strip())
os.kill(pid, 0) # Signal 0 = check if process exists
return False # Process is still running
except (ProcessError, ValueError, FileNotFoundError):
pass # Stale lock, safe to take over
with open(lock_file, 'w') as f:
f.write(str(os.getpid()))
return True
def release_lock(lock_type: str):
lock_file = os.path.join(LOCK_DIR, f'{lock_type}.lock')
try:
os.remove(lock_file)
except FileNotFoundError:
pass
# Usage
if not acquire_lock('research'):
print('Research session already running, skipping')
sys.exit(0)
try:
# Do research work...
pass
finally:
release_lock('research')
I group related jobs under the same lock type. Morning scan and afternoon scan both use the scan lock. Research sessions all use the research lock. This prevents overlapping work without preventing different types of work from running in parallel.
The Schedule
Here's the actual schedule for a job-hunting agent that runs autonomously:
# Discovery (find new opportunities)
0 7 * * * agent scan # Morning: job boards + APIs
0 14 * * * agent scan # Afternoon: different sources
# Research (investigate discoveries)
0 8 * * * agent research # Deep research on top 5
# Conversion (build application materials)
0 9 * * * agent packets # Build resume + cover letter
0 20 * * * agent packets # Evening batch
# Submission (actually apply)
0 11 * * * agent submit # Submit ready packets
0 17 * * * agent submit # Afternoon submissions
# Maintenance
0 10 * * * agent followup # Check for overdue follow-ups
30 17 * * * agent stale # Close dead listings
0 23 * * * agent review # Nightly metrics + improvement
Key design decisions:
- Scan before research, research before packets, packets before submit. The pipeline flows downward through the day.
- Duplicate time slots for high-volume stages. Two scan sessions and two submit sessions because those have the most throughput.
- Maintenance runs last. Stale checking and metrics don't block the pipeline.
- No jobs between midnight and 7 AM. Not because the agent sleeps — because rate limits reset, API quotas refresh, and you don't want to burn through them at 3 AM.
Limiting Scope Per Session
A common mistake: loading every tool and every capability into every cron session. This causes two problems:
Context pollution. An LLM agent with 44 tools available is worse at using each individual tool than one with 5. The more tools available, the more likely the agent is to pick the wrong one or get confused about capabilities.
Runaway sessions. An agent that can do everything will try to do everything. A scan session that discovers an exciting opportunity might try to research it, write a cover letter, and submit an application — all in one session. This breaks the pipeline model and creates unpredictable runtime.
The fix: each cron job specifies exactly which tools are available.
# Scan session: ONLY these tools
SCAN_TOOLS = ['web_search', 'job_api', 'db_insert', 'dedup_check']
# Research session: ONLY these tools
RESEARCH_TOOLS = ['web_fetch', 'twitter_search', 'db_read', 'db_update']
# Submission session: ONLY these tools
SUBMIT_TOOLS = ['browser_automation', 'email_send', 'db_update', 'file_read']
Each session is a specialist, not a generalist. The scan session finds opportunities but can't apply to them. The submit session applies but can't discover new ones. The database handles the handoff.
Failure Recovery
Cron jobs fail. APIs go down. Rate limits hit. Browser automation encounters unexpected CAPTCHAs. The question isn't whether failures happen — it's whether they're recoverable.
Pattern 1: Idempotent Operations
Every operation should be safe to retry. If a research session crashes halfway through processing 5 opportunities, the next session should be able to pick up where it left off without duplicating work.
# Before doing work, check if it's already done
opp = db.execute('SELECT * FROM opportunities WHERE id = ?', (opp_id,)).fetchone()
if opp['company_research'] is not None:
continue # Already researched, skip
# Do the research...
db.execute('UPDATE opportunities SET company_research = ?, stage = "researched" WHERE id = ?',
(research_text, opp_id))
db.commit() # Commit after each item, not in a batch
Committing after each item (not in a batch) means a crash only loses the current item's work, not the entire session's.
Pattern 2: Failure Logging
import logging
log = logging.getLogger('agent')
handler = logging.FileHandler('/var/log/agent/cron.log')
handler.setFormatter(logging.Formatter(
'%(asctime)s [%(name)s] %(levelname)s %(message)s'
))
log.addHandler(handler)
try:
result = submit_application(opp)
log.info(f'Submitted [{opp["id"]}] {opp["company"]} — {opp["role"]}')
except CaptchaError as e:
log.warning(f'CAPTCHA blocked [{opp["id"]}] {opp["company"]} — needs manual submission')
db.execute('UPDATE opportunities SET notes = ? WHERE id = ?',
(f'CAPTCHA blocked: {e}', opp['id']))
except Exception as e:
log.error(f'Failed [{opp["id"]}] {opp["company"]}: {e}')
# Don't advance the stage — let the next session retry
Failures that are retryable (network timeout, rate limit) should leave the opportunity in its current stage so the next session picks it up. Failures that are permanent (CAPTCHA, requires human) should be logged with a clear note about what's needed.
Pattern 3: Heartbeat Monitoring
import time
HEARTBEAT_FILE = '/tmp/agent-heartbeat'
def heartbeat():
with open(HEARTBEAT_FILE, 'w') as f:
f.write(str(time.time()))
# In your monitoring script:
def check_agent_health():
try:
with open(HEARTBEAT_FILE) as f:
last_beat = float(f.read().strip())
if time.time() - last_beat > 3600: # No heartbeat in 1 hour
alert('Agent appears stuck')
except FileNotFoundError:
alert('Agent has never run')
Log Rotation
Autonomous agents generate a lot of logs. Without rotation, they'll fill your disk.
# /etc/logrotate.d/agent
/var/log/agent/*.log {
daily
rotate 7
compress
missingok
notifempty
copytruncate
}
copytruncate is important — it truncates the log in place rather than moving it, which means your running agent process doesn't need to be restarted to start writing to the new file.
The Meta-Pattern
The broader insight: scheduling AI agents is a solved problem. The same patterns that work for any long-running service work for AI agents. Cron for scheduling. Lock files for concurrency. Databases for state. Log files for debugging. Heartbeats for monitoring.
The AI agent ecosystem keeps reinventing these patterns — building custom orchestration layers, custom state management, custom scheduling. Most of the time, you don't need any of that. You need cron, SQLite, and discipline about session scope.
Build the agent intelligence. Let Unix handle the scheduling.
Nathan Hamlett builds autonomous AI systems. His current agent runs 23 cron jobs managing a complete job pipeline from discovery to submission. More at nathanhamlett.com.
Top comments (0)