Cron was a crutch. I knew it the second I started running more than three bots.
My whole fleet was held together by crontab -e and a prayer. Every bot had its own line. Some ran every minute, some every five, some every fifteen. When one of them silently died, I found out hours later because the dashboard stopped updating. There was no restart policy. There was no log aggregation. There was a guy on his couch in Sacramento refreshing a Grafana panel and squinting.
I had a laptop, an email account, and social media accounts when I started this thing. I learned what cron was maybe six weeks ago. So when I tell you I was scared to migrate twelve production services off of it in one sitting, understand that I am also still learning what "production" actually means in a serious shop. Senior devs reading this will spot ten amateur moves before they finish the second paragraph.
I called it Winter Soldier. Phases 1 through 3. The plan, written in a markdown file at 9pm on the 21st, was simple on paper:
- Write a unit file template for each bot class.
- Write a matching timer for the scheduled ones, leave the long-running ones as plain services with
Restart=always. - Disable the cron entries. Don't delete them. Comment them out so I can roll back.
-
systemctl daemon-reload. Pray. - Enable and start, one by one. Watch journalctl.
The first one went in around 10:30pm. Safety Pack worker. I wrote the unit file, dropped it in /etc/systemd/system/, ran daemon-reload, and just stared at the terminal for a full thirty seconds before typing systemctl start. The cursor blinked. systemctl status came back green. journalctl -fu showed the bot picking up exactly where the cron version left off. I exhaled so hard I startled the dog.
Then I did it eleven more times.
Somewhere around bot seven I hit the wall. The intel route went green on systemd but the dashboard showed it as offline. Stared at it for forty minutes. The unit was running. The logs were clean. The endpoint was responding. The dashboard just refused to believe it existed.
The registry. The dashboard read from a service registry JSON file, and the registry got populated by a script that parsed crontab -l. I'd commented out the cron lines. So as far as the registry was concerned, half my bots had vanished.
One line fix. I pointed the parser at systemctl list-units --type=service 'bot-*' instead of crontab. Restarted the registry writer. Dashboard lit up like a Christmas tree.
The last bot came up at 5:47am. I know because I took a screenshot. Twelve services. All green. All with restart policies. All with proper logs in journalctl. All with timers I could inspect with systemctl list-timers.
No more guessing. No more refreshing. No more cron tab spaghetti.
The sun was coming up over the parking lot outside my window. I made coffee. I sat on the floor and just looked at the dashboard for a while. Twelve green dots. I built that.
A Safety Pack went out the door around 6:15am while I was still sitting there. Then another one. They add up little by little like pennies in a jar.
The scary part wasn't daemon-reload in production. The scary part was realizing that for the first time in my life I'd designed something, planned it in phases, and executed all three phases without anything catching fire.
What would you have called Phase 4?
Top comments (0)