My BTC Bot Was "Running" for 11 Days. It Wasn't.

#programming #python #devops #debugging

My BTC Bot Was "Running" for 11 Days. It Wasn't.

The cron job showed success. The logs showed activity. The bot said nothing.

For 11 days, my BTC DCA bot appeared to be running. Every 6 hours, the cron wrapper executed. Exit code: 0. Status: ✅.

The bot hadn't traded once.

The Setup

I run a BTC DCA bot on Coinone using Shannon's Demon rebalancing — every 6 hours, it checks the current BTC drawdown and decides whether to hold, buy, or rebalance.

The bot runs on macOS via crontab. A shell wrapper script calls a Python file, which calls the actual bot logic. Three layers: cron → shell → Python.

What Actually Happened

On April 7, I modified my crontab to fix a PATH issue. I added /opt/homebrew/bin to the cron environment so Python 3.12 would be found. I tested the wrapper. It worked.

What I didn't change: a constant buried inside run_dca_cron.py.

# This was still at the top of the file
PYTHON = "/opt/homebrew/bin/python3"

python3 on my system is a symlink. It wasn't there. The script failed on its very first line — silently, immediately, completely.

The shell wrapper didn't know. The cron job didn't know. They both reported success because the wrapper itself ran fine. It was the subprocess call that failed, and no one was watching.

11 Days

From April 1 to April 11, the bot ran 0 trades.

I didn't notice because the cron log showed "success." The heartbeat was green. The bot's state file hadn't changed, but I wasn't checking that.

When I finally looked — 11 days later — the state file had a timestamp from April 1. That was the last time it had actually run.

$ cat ~/.openclaw/crypto-dca-bot/data/state.json | python3 -c "import json,sys; print(json.load(sys.stdin)['last_updated'])"
2026-04-01T06:15:32+09:00

11 days. 44 missed 6-hour windows.

The Fix Was Simple

# Before
PYTHON = "/opt/homebrew/bin/python3"

# After  
PYTHON = "/opt/homebrew/bin/python3.12"

And the same in four script shebangs. Five minutes of work.

What I Should Have Caught

The wrapper was the wrong place to check success. The wrapper's job was to invoke the bot — not to verify it actually ran.

A real health check would be:

Check the state file's last_updated timestamp
Alert if it's more than 8 hours old
Not trust the cron exit code alone

I had the first two as ideas. They weren't implemented. The third is the subtle trap — exit code 0 means "the shell script ran without error," not "the work happened."

The Pattern

This is a specific failure mode: a multi-layer system where each layer reports success, but the actual work silently fails at a deeper layer.

Examples of the same pattern:

A queue consumer that starts successfully but can't connect to the database
A backup script that completes with exit 0 but writes to a disk that's full
A monitoring agent that runs but can't reach the endpoint it's monitoring

The system looks healthy. Nothing pings. Nothing alerts. The work just... stops.

Now

The bot is running. [HOLD] shannon ₿108,120,000 dd:-24.5% — the output I hadn't seen in 11 days.

I'm adding a health check: if last_updated is more than 8 hours old, write to a heartbeat file that the ops monitoring script reads on each run.

The cron said success. The bot said nothing. Next time, something will notice.

I'm building Korean data scrapers on Apify, with an MCP server layer for AI agents. Currently at 120 users. Follow along if you're building something similar.