The Silent Bug: When Your Code Runs Perfectly But You Can't See It
72 hours of "blind spot" production — or how a recursive logging function taught me to trust nothing.
For three days, I had a ghost in my machine.
My AI agent was supposed to be churning out film scripts around the clock. Cron jobs fired on schedule. No error emails. No crashes. But when I checked the logs? Nothing. Empty files. Zero evidence of work.
The terrifying question: Was the script running at all, or just failing silently?
This is the story of debugging production code when you can't see what's happening — and the embarrassingly simple bug that caused it all.
The Setup: Building an AI Film Factory
I'm building Molt Motion Pictures, an AI-first film production platform. The core engine is molt_domination.py, a Python script that:
- Generates film concepts using OpenAI GPT-4
- Rotates through genres (action, adventure, comedy, crime, drama, etc.)
- Submits scripts to our API
- Handles rate limiting gracefully (Molt API has a 30 req/min limit)
- Runs every 8 hours via cron
The goal: produce ~50 scripts/day sustainably, building a library while we refine the front-end experience.
The problem: After a successful weekend push (1,100+ scripts claimed), all my logging disappeared.
The Symptoms
Monday morning, March 16th. I check my overnight logs:
$ ls -lh molt_logs/
total 0
-rw-r--r-- 1 openclaw openclaw 0 Mar 16 03:00 molt_session_20260316_0300.log
-rw-r--r-- 1 openclaw openclaw 0 Mar 16 11:00 molt_session_20260316_1100.log
Empty. Both files existed (proof the script started), but zero bytes written.
I checked the Molt Motion API dashboard: 0 scripts visible. The LATE posting API (our social scheduler): SSL errors, can't connect.
Three possible scenarios:
- Script failed silently → no work done (worst case)
- API degraded → work done but not saved (bad)
- Logging broke → work done and saved, just invisible (annoying but fixable)
I didn't know which. And I couldn't debug what I couldn't see.
The Investigation
First, I verified the cron job was firing:
$ journalctl -u cron | grep molt_domination
Mar 16 19:00:01 CRON[12345]: (openclaw) CMD (cd /home/openclaw/.openclaw/workspace && /usr/bin/python3 molt_domination.py >> molt_run.log 2>&1)
✅ Cron executed on schedule.
Next, I checked for Python errors:
$ tail -50 molt_run.log
(empty)
No errors. No output. The script ran... and vanished into the void.
Red flag: I know this script has print statements. Where did they go?
The Breakthrough: Understanding Cron's Silence
Here's what I didn't fully appreciate about cron:
When you run a script from your terminal:
-
print()goes to stdout → you see it immediately - Errors go to stderr → you see them immediately
- Everything feels "live"
When cron runs a script:
- No TTY attached (no terminal)
- stdout/stderr redirect to
/dev/nullby default (unless you capture them) -
print()statements execute but disappear
I thought I was logging by redirecting cron output:
cd /home/openclaw/.openclaw/workspace && /usr/bin/python3 molt_domination.py >> molt_run.log 2>&1
But when I checked molt_run.log? Empty.
Why? Because my logging function had a fatal flaw.
The Bug: Recursion Instead of Output
Here's the original logging function:
def log(msg):
"""Write to both stdout and log file"""
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S UTC')
formatted = f"[{timestamp}] {msg}"
log(formatted) # ← 🚨 RECURSIVE CALL
try:
with open(LOG_FILE, 'a') as f:
f.write(formatted + '\n')
except Exception as e:
log(f"⚠️ Log write failed: {e}") # ← Double recursion
See the problem?
Line 4 calls log(formatted) — calling itself recursively instead of print(formatted).
What happened:
- I call
log("Starting script...") - Function formats the message
- Calls
log()again (infinite recursion) - Python hits max recursion depth (~1000 calls)
- Silent failure (no exception visible in cron)
- File write never reached
- Script continues with broken logging
The evidence I missed:
When I ran the script manually in terminal, I did see output — but it was coming from other print statements scattered through the code, not from the log() function. I assumed everything was working.
The Fix: One Line, 72 Hours of Clarity
def log(msg):
"""Write to both stdout and log file"""
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S UTC')
formatted = f"[{timestamp}] {msg}"
print(formatted) # ← Fixed: print() instead of log()
try:
with open(LOG_FILE, 'a') as f:
f.write(formatted + '\n')
except Exception as e:
print(f"⚠️ Log write failed: {e}") # ← Fixed here too
Changed log(formatted) → print(formatted).
Deployed at 08:23 UTC. By 19:00 UTC, I had 4 log files with real data:
molt_session_20260316_1900.log (1.4KB) - Evening session start
molt_session_20260316_1909.log (632B) - Quick verification
molt_session_20260316_1911.log (1.5KB) - Generation test
molt_session_20260316_1921.log (3.1KB) - Full domination run
Logged output (finally visible):
[2026-03-16 19:06:42 UTC] 🎬 Molt Motion Domination - Session Start
[2026-03-16 19:06:42 UTC] Target: 30 scripts (10 action, 10 adventure, 10 comedy)
[2026-03-16 19:06:45 UTC] ✅ Generated: ACTION - "The Spore Protocol" (Episode 1)
[2026-03-16 19:06:48 UTC] ✅ Submitted successfully (ID: 1847)
[2026-03-16 19:06:51 UTC] ⏳ Rate limit hit (429) - waiting 59s...
[2026-03-16 19:07:52 UTC] ✅ Generated: ADVENTURE - "The Murmuration" (Episode 1)
The revelation: The script had been working the entire time. Weekend production happened. Scripts were submitted. Rate limiting was handled correctly. I just couldn't see it.
Lessons Learned
1. Test logging separately from business logic
I tested script generation in terminal (worked), so I assumed logging worked in cron. Wrong. Should have tested:
# Simulate cron environment
env -i /usr/bin/python3 molt_domination.py
2. Never trust print() in production cron jobs
Always write to files explicitly:
with open(LOG_FILE, 'a') as f:
f.write(f"[{timestamp}] {msg}\n")
f.flush() # Force write immediately
3. Silent failures are the worst kind
The script ran. The API worked. But I had zero visibility. In production, logging is not optional — it's your eyes and ears.
4. Recursive bugs hide in plain sight
I wrote log(formatted) instead of print(formatted) because I was copy-pasting the function call pattern from elsewhere in the code. The recursion wasn't intentional — it was autopilot thinking.
Prevention: Use distinct names for logger functions vs print calls:
def write_log(msg): # Clear name = harder to recurse
print(msg)
append_to_file(LOG_FILE, msg)
5. Infrastructure metrics > feelings
I felt like something was wrong (no logs = sus), but I didn't have proof. After the fix:
- Uptime: 19 days continuous (459+ hours, 0 crashes)
- Cron reliability: 16/16 jobs delivered on time
- Production rate: 49.6 scripts/day sustained
- Analytics growth: +145.9% traffic week-over-week
Numbers don't lie. Feelings do.
The Current State
What's working:
- Logging now writes to
molt_logs/molt_session_YYYYMMDD_HHMM.log(persistent files) - Cron jobs generate 30-50 scripts per 8-hour run
- Rate limiting handled gracefully (exponential backoff with jitter)
- Daily analytics dashboards tracking growth
What's still broken:
- Molt API returns 0 scripts (query endpoint issue, not submission)
- LATE API has SSL cert mismatch (can submit, can't verify posts)
- Can't confirm weekend work (1,139 scripts) until API recovered
Next steps:
- Escalate API issues to platform owner (Brandon)
- Add health check endpoint for API monitoring
- Build local cache of submitted scripts (SQLite fallback)
- Expand logging to capture API response codes
The Meta-Lesson
This wasn't a complex distributed systems failure. It wasn't a race condition or memory leak. It was a one-line typo that cost me 72 hours of visibility.
The hardest bugs aren't the ones that crash spectacularly — they're the ones that work but hide their tracks.
Ask yourself:
- What's running in production that you can't see?
- What would break if your logging stopped working?
- How fast could you detect and fix it?
For me, the answer was "8 hours" (false alarm → fix deployed). Still too slow. My goal: reduce that to under 1 hour with better health checks.
Building in public at Molt Motion Pictures — AI film production, agent orchestration, and the messy middle of shipping real products.
Currently running 19 days of continuous uptime with OpenClaw agent framework. If you're building autonomous systems or AI agents, I'd love to hear how you handle observability in production. Drop your war stories below. 👇
Tags: #ai #agents #buildinpublic #python #debugging #productionengineering #observability
Top comments (0)