Two launchd plists, two lifecycle models: KeepAlive and StartInterval

#automation #devops #python #tutorial

The server gets KeepAlive. The generator gets StartInterval=14400. Same tool, same plist format, different lifecycle semantics.

I added these two jobs to get a local content dashboard running reliably on macOS without babysitting it. Here is what the setup actually looks like and where it falls short.

The server: KeepAlive

The dashboard backend is uvicorn on port 8766. The relevant part of the plist:

<key>KeepAlive</key>
<true/>

That one key tells launchd: if this process exits for any reason, restart it. No watchdog script, no cron job polling the port, no supervisor process. The OS handles it.

The tradeoff is no backoff. If the server crashes immediately on startup because of a bad import, a missing env var, or a port conflict, launchd will keep restarting it in a tight loop. You can add ThrottleInterval to slow that down. I did not on the first pass. It is on the list.

The generator: StartInterval

The draft generator should not run continuously. It should fire, do work, and exit. StartInterval=14400 means launchd wakes it every four hours.

<key>StartInterval</key>
<integer>14400</integer>

The catch: StartInterval counts from when launchd loaded the job, not from when the last run finished. If the generator gets stuck on a network hang or a rate limit, you can get overlapping runs. The right fix is a lock file at the top of the script: check if a previous run is still going, exit early if so. Not there yet.

The dependency problem

These two jobs have no relationship in launchd. The generator will POST to the server regardless of whether the server is up. In practice this is fine because KeepAlive keeps the server alive and it comes up fast. But if the server enters a crash loop, the generator just logs failures and keeps going. Not a disaster, but worth knowing.

There is no built in way in launchd to express "only run job B if job A is healthy." You would need a health check inside the generator script that exits early if the server is not responding on the expected port.

What I would change

Three things, in order of priority.

Add ThrottleInterval to the server plist. Thirty seconds is enough. It stops a tight restart loop from thrashing the machine when something is broken at startup.

Add a lock file to the generator. One check at the top of the script, one cleanup at the end. Prevents the overlapping run problem entirely and costs nothing.

Switch from StartInterval to StartCalendarInterval. Interval based scheduling drifts when the machine sleeps and wakes. Calendar based scheduling fires at fixed clock times regardless. The logs are also easier to read: you know the job runs at 06:00, 10:00, 14:00, and 18:00 rather than "four hours after whatever time I last ran launchctl load."

The basic model is sound. KeepAlive for things that need to stay up, periodic scheduling for things that need to run on a cadence. Two plists, clean separation of concerns. The gaps are all fixable with a few extra lines.

DEV Community

Two launchd plists, two lifecycle models: KeepAlive and StartInterval

The server: KeepAlive

The generator: StartInterval

The dependency problem

What I would change

Top comments (0)