DEV Community

Deva
Deva

Posted on

P8: Plugging the warm up ceiling hole in comments.tick

The tick function kept calling client.post_reply even after the warm up ceiling for the day was already hit. Nothing gated it. If the scheduler fired twice in a row, or someone called tick, n 5 when only two budget slots remained, the ceiling was advisory at best.

This is the kind of bug that does not blow up loudly. It just silently overshoots your own rate limits during the exact window when overshooting matters most: the early warm up phase where every write is supposed to be conservative.

What P8 actually needed

Two distinct failure modes required two distinct fixes.

Natural ticks running past the ceiling. A natural tick is the scheduler's routine fire with no explicit n. It has no business posting anything once warmup.over_ceiling(s) returns True. So the fix there is a hard early return:

if not explicit_n and warmup.over_ceiling(s):
 print("tick: warmup ceiling reached, skipping (over_ceiling)")
 return []
Enter fullscreen mode Exit fullscreen mode

No clamp, no partial run. Just stop. Returning an empty list means callers see a clean no op, not an error.

Explicit n ticks requesting more than what remains. Explicit n is for manual runs and tests. You might call tick, n 5 legitimately, but if only two slots remain in today's warm up budget you should get two replies, not five. So after the early return check, n gets clamped:

if warmup.enabled():
 wu_remaining = warmup.remaining(s)
 if wu_remaining <= 0:
 print("tick: warmup ceiling reached, nothing remaining this tick")
 return []
 n = min(n, wu_remaining)
Enter fullscreen mode Exit fullscreen mode

This path runs for both explicit and natural ticks when warm up is on. The natural tick hits the over_ceiling guard first, so by the time you reach the clamp, you are always in the explicit n case. The symmetry is intentional: client.post_reply should never be reachable when wu_remaining <= 0, full stop.

The tradeoff worth naming

You could collapse both guards into one. Check remaining, if it is zero return early, otherwise clamp n. Fewer code paths, same outcome.

I kept them separate because they express different intent. The early return on natural ticks is a policy decision: the scheduler should not post at all past the ceiling, even if technically one slot somehow remained. The clamp on explicit n is a safety net: a human or test runner asking for more than budget allows should get whatever is left, not an error and not a silent skip.

Mixing those two semantics into one block makes the policy harder to read and harder to change later if you want to tighten one without touching the other.

What I would do differently

Putting the ceiling check inside tick() works, but it is the wrong altitude for a constraint this important. The warm up ceiling is a daily cap that belongs at the scheduler layer, checked before tick is even called. If the launchd job or whatever is driving the clock knew about the ceiling, it could skip the tick entirely and save the overhead of spinning up the target pool, running discovery, and doing all the pre flight checks before hitting the guard.

Inside tick() is the right fallback, not the primary line of defense. The primary should be one level up. P8 fixed the hole; the cleaner version of this fix closes it earlier in the call chain and leaves tick() as the last resort guard, not the first.

Still shipping P8 as is. The ceiling is enforced where it matters and client.post_reply cannot be reached once budget is exhausted. That is the actual requirement.

Top comments (0)