My Production App Was Down for 24 Hours and Nobody Told Me

#observability #webdev #monitoring #devops

I built an AI assessment app for a consulting firm prospect. Deployed it on Supabase free tier. Sent them the link. Then I waited for their review.

What I didn't know: Supabase auto-pauses free-tier projects after 7 days of inactivity. My prospect opened the link and saw an error page. For up to 24 hours, my best lead thought my work was broken. I found out by accident when I checked the dashboard myself.

No alert. No email I noticed. No monitoring. Just silence while my credibility evaporated.

The failure mode nobody warns you about

Most monitoring advice assumes you're running your own servers. "Set up Prometheus. Configure Grafana dashboards. Integrate PagerDuty." That's great if you're running Kubernetes at scale.

But if you're an indie developer shipping on free tiers, the failure mode is different. Your platform shuts you down deliberately because you're not generating enough activity.

This isn't a Supabase-specific problem. It's a pattern across every free-tier platform:

Supabase free: auto-pauses after 7 days of inactivity
Fly.io free: machines stop after ~5 minutes idle
Render free: services spin down after 15 minutes of inactivity
Railway free: $5 credit cap, then full stop
Vercel hobby: bandwidth and serverless execution limits

Every one of these can take your app offline while you sleep. And none of them page you when it happens.

The 5 signals I actually monitor now

After losing that prospect, I built a monitoring checklist. Nothing fancy. No SaaS subscription required. Just the bare minimum that would have caught the problem.

1. Availability ping (is the thing alive?)

The most basic check. Hit your API endpoint. If the HTTP status isn't 2xx or a known "alive" code, something is wrong.

#!/bin/bash
URL="https://your-project.supabase.co/rest/v1/"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 15 "$URL")

case "$STATUS" in
  200|401|404|405) echo "ALIVE" ;;
  *) echo "DOWN: HTTP $STATUS" ;;
esac

Why 401 counts as alive: Supabase returns 401 when you hit the REST endpoint without an API key. That's fine. It means the server is running. A paused project returns a 5xx or times out entirely.

2. Critical endpoint health (does it return real data?)

An availability ping tells you the server boots. It doesn't tell you the database migrated correctly or the API returns valid responses.

Pick your most critical endpoint. Hit it with real parameters. Validate the response shape.

RESPONSE=$(curl -s -H "apikey: $SUPABASE_ANON_KEY" \
  "$URL/your_table?select=id&limit=1")

if echo "$RESPONSE" | jq -e '.[0].id' > /dev/null 2>&1; then
  echo "HEALTHY"
else
  echo "DEGRADED: unexpected response shape"
fi

This catches migrations that broke a column name, RLS policies that started blocking reads, and connection pool exhaustion. All things I've hit in production that a simple ping would have missed.

3. Platform-specific tripwires

Every platform has a "we're about to shut you down" signal. Find it and watch for it.

For Supabase, they email you 24 hours before auto-pausing. So I added this to my morning Gmail scan:

from:supabase.com subject:(paused OR pausing OR inactive)

For AWS, it's budget alerts:

aws budgets create-budget --account-id $ACCOUNT_ID \
  --budget '{"BudgetName":"monthly-cap","BudgetLimit":{"Amount":"10","Unit":"USD"},"TimeUnit":"MONTHLY","BudgetType":"COST"}' \
  --notifications-with-subscribers '[{"Notification":{"NotificationType":"ACTUAL","ComparisonOperator":"GREATER_THAN","Threshold":50},"Subscribers":[{"SubscriptionType":"EMAIL","Address":"you@example.com"}]}]'

The specifics vary by platform. The principle doesn't: find the signal your platform sends before it kills you, and make sure you're listening.

4. Post-deploy smoke test

Every deployment should end with a health check. Not "the build succeeded." Not "the tests passed." Did the deployed version actually respond correctly?

# .github/workflows/smoke.yml
name: Post-deploy smoke test
on:
  workflow_run:
    workflows: ["Deploy"]
    types: [completed]

jobs:
  smoke:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    steps:
      - name: Check production health
        run: |
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            --max-time 30 "https://your-app.vercel.app/api/health")
          if [ "$STATUS" != "200" ]; then
            echo "SMOKE TEST FAILED: HTTP $STATUS"
            exit 1
          fi

I've had deployments where the build was green, tests passed, Vercel reported success, and the app was broken because an environment variable wasn't set in the production environment. A 10-second curl after deploy would have caught it.

5. Keep-alive cron for free tiers

This is the one that would have saved me. A cron job that pings your free-tier services twice a week, resetting the inactivity timer before the platform shuts you down.

#!/bin/bash
# Keep free-tier backends alive. Run Mon+Thu via cron.
# Any request resets the inactivity timer.

PROJECTS=(
  "project-id-1:my-saas-demo"
  "project-id-2:portfolio-api"
)

for entry in "${PROJECTS[@]}"; do
  ID="${entry%%:*}"
  NAME="${entry#*:}"
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    --max-time 15 "https://${ID}.supabase.co/rest/v1/")
  echo "$(date -Iseconds) $NAME HTTP=$STATUS"
done

# crontab
0 9 * * 1,4 /home/you/bin/keepalive.sh >> /var/log/keepalive.log

Two requests per week. Zero cost. Prevents a class of outage that no amount of application-level error handling can catch.

What I'd do differently

If I were starting a new project today, I'd set up monitoring before I deploy, not after the first outage.

The full checklist takes about 30 minutes:

Write a /api/health endpoint that checks database connectivity
Add a post-deploy smoke test in CI
Set up platform-specific alerts (budget, pause warnings, rate limits)
Add a keep-alive cron for any free-tier dependency
Put the monitoring script in the same repo as the app

None of this requires a monitoring SaaS. A bash script, a cron job, and a GitHub Actions workflow cover 90% of what a solo developer needs.

The remaining 10%? That's where proper observability tools earn their keep. Distributed tracing, error aggregation, performance profiling. But you can't justify those until you've nailed the basics.

Start with a cron job and a curl. It's boring. It works. And it would have saved me from explaining to a prospect why my demo was showing an error page.