Cutting a 26-Minute Supabase Batch Job to Near Zero
The Problem
A horse-racing prediction batch job was taking 26 minutes on every run. The culprit: re-fetching historical race data for every horse on every run — even for horses we'd already fetched data for.
The data doesn't change. Why keep re-fetching?
The Fix: A Single Boolean Column
ALTER TABLE horse_entries
ADD COLUMN prev_history_fetched boolean DEFAULT false;
Python batch:
def fetch_horse_histories(conn, race_id: int) -> None:
cur = conn.cursor()
# Only fetch unfetched horses
cur.execute("""
SELECT he.id, he.horse_id
FROM horse_entries he
WHERE he.race_id = %s
AND he.prev_history_fetched = false
""", (race_id,))
entries = cur.fetchall()
for entry_id, horse_id in entries:
try:
history = scrape_previous_race(horse_id)
upsert_history(conn, horse_id, history)
except Exception as e:
print(f"Horse {horse_id} 404/skip: {e}")
finally:
# Mark as fetched regardless of success/failure
cur.execute("""
UPDATE horse_entries
SET prev_history_fetched = true
WHERE id = %s
""", (entry_id,))
conn.commit()
Key insight: mark as fetched even on 404 errors. A horse that doesn't exist won't exist next run either — no point retrying.
Results
| Run | Before | After |
|---|---|---|
| First run | 26 min | 26 min (same) |
| Second run onward | 26 min | ~0 sec |
| 404 horses retried | Every run | Never |
Generalizing the Pattern
Any "fetch once, never changes" data benefits from this pattern:
ALTER TABLE <table>
ADD COLUMN <field>_fetched boolean DEFAULT false,
ADD COLUMN <field>_fetched_at timestamptz;
Criteria:
- Data is idempotent (same input → same result)
- Re-fetching is expensive or slow
- Failures don't need retries (or you want explicit retry logic)
For data that does need refreshing, add an expiry:
WHERE NOT prev_history_fetched
OR prev_history_fetched_at < NOW() - INTERVAL '30 days'
Conclusion
"Process everything every run" is a common batch antipattern. One boolean column makes subsequent runs instant. The cost is a single extra column and one UPDATE per row on first fetch.
Building in public: https://my-web-app-b67f4.web.app/
Top comments (0)