My OAuth token expired mid-job. Took 2 hours to figure out why.
Had this happen twice now. First time I blamed the API. Second time I blamed myself. Both times I was wrong about why it happened.
What the job was doing
Running a daily sync between a client's CRM and their email marketing platform. Nothing fancy. Pull contacts from CRM, push to email platform, update statuses. Cron job runs at 6 AM, done by 6:15, everyone happy.
Except one Tuesday the sync just stopped. Partial data made it over. The CRM showed 1,847 contacts synced. The email platform showed 1,203. Missing 644 contacts. Client noticed by 9 AM when their campaign went to the wrong list.
First instinct: the API broke. Classic move.
The debugging detour
Spent an hour checking everything. API docs, rate limits, webhook logs. Everything looked fine. The CRM API returned success for those 644 contacts. The email platform API also returned success. But the data wasn't there.
Pulled the actual HTTP responses. Both said 200 OK. Both had valid JSON. The contacts just weren't in the database.
Then I noticed something in the logs. The email platform had started returning 401 errors at 6:08 AM. Right in the middle of the sync. But the script kept running and hitting it anyway, getting 401s for every remaining contact.
Wait, 401? That's unauthorized. But we were getting 200s earlier.
The actual problem
Turns out the email platform's OAuth tokens expire after 1 hour. We were getting new tokens at the start of each sync run (good). But the sync was taking longer than expected because of a CRM migration the client did the week before. More contacts = longer sync = token expired mid-run.
The 200 responses? Those were from the CRM side. The email platform was rejecting everything with 401, but I wasn't logging the email platform responses properly. Only the CRM side. So it looked like everything was working when half of it was failing silently.
The fix was embarrassingly simple. Refresh the token every 30 minutes during long syncs instead of once at the start. Added a token refresh check before each batch of 500 contacts.
I refresh tokens proactively now. If a sync takes more than 20 minutes, I assume the token might be stale. Better to refresh early than debug another silent failure.
Logging both sides matters too. If I had logged the email platform responses properly, the 401s would have been obvious within 5 minutes instead of an hour.
Still annoyed about that Tuesday though. Two hours for a token refresh check.
Top comments (0)