Your Django REST API is serving hundreds of requests a minute. But is it actually healthy? A 200 from curl on your laptop doesn't tell you whether the background workers are processing jobs, whether the database connection pool is exhausted, or whether a recent deploy quietly broke your serializers. Vigilmon closes that gap. This tutorial shows you how to instrument a DRF app end-to-end with uptime checks, Celery heartbeats, and alert routing.
What You'll Build
- A
/api/health/endpoint that checks the database, cache, and queue - Vigilmon HTTP monitors watching your DRF routes
- A Celery Beat heartbeat so you know your workers are running
- A Django management command pattern for background-job heartbeats
- Email and Slack alert channels
Prerequisites
- Django 4.x+ with Django REST Framework installed
- A free Vigilmon account
- Optionally: Celery + Redis (or RabbitMQ) for async task monitoring
Step 1: Create the Health Check Endpoint
DRF gives you a clean way to expose a health view without touching your existing serializers.
Install django-health-check
pip install django-health-check
Add to INSTALLED_APPS:
INSTALLED_APPS = [
...
"health_check",
"health_check.db",
"health_check.cache",
"health_check.storage",
# optional: "health_check.contrib.celery",
]
Wire up the URL in urls.py:
from django.urls import path, include
urlpatterns = [
...
path("api/health/", include("health_check.urls")),
]
GET /api/health/ now returns a 200 with a JSON body when all checks pass, or a 500 when any check fails.
Roll Your Own (Lightweight Alternative)
If you'd rather not add a dependency, a plain DRF view works just as well:
# myapp/views.py
from django.db import connections
from django.core.cache import cache
from rest_framework.decorators import api_view
from rest_framework.response import Response
import time
@api_view(["GET"])
def health_check(request):
checks = {}
status_code = 200
# Database
try:
connections["default"].cursor().execute("SELECT 1")
checks["database"] = "ok"
except Exception as exc:
checks["database"] = f"error: {exc}"
status_code = 503
# Cache
try:
cache.set("_health", "1", timeout=5)
assert cache.get("_health") == "1"
checks["cache"] = "ok"
except Exception as exc:
checks["cache"] = f"error: {exc}"
status_code = 503
return Response(
{"status": "ok" if status_code == 200 else "degraded", "checks": checks},
status=status_code,
)
Register it:
# urls.py
from myapp.views import health_check
urlpatterns = [
...
path("api/health/", health_check, name="health-check"),
]
Step 2: Add a Vigilmon HTTP Monitor
Log in to Vigilmon and create a new HTTP monitor:
| Field | Value |
|---|---|
| URL | https://api.yourdomain.com/api/health/ |
| Method | GET |
| Expected status | 200 |
| Check interval | 1 minute |
| Regions | Select 2–3 regions |
Vigilmon polls your endpoint from multiple regions. If it gets a non-200 or the request times out, you're alerted within seconds.
Tip: also add a second monitor for your main API base URL (/api/) with expected status = 200 OR 401. A 401 proves the server is running; a 404 or 500 means something broke.
Step 3: Monitor Celery Workers with a Heartbeat
This is the part most tutorials skip. Your HTTP endpoint can be healthy while Celery workers are silently dead — queued jobs pile up, emails stop sending, and you find out from a user complaint.
Create a Vigilmon Heartbeat Monitor
In Vigilmon, add a new Heartbeat monitor:
- Choose Heartbeat / Cron monitor type
- Set the expected ping interval to 10 minutes
- Copy the generated ping URL:
https://vigilmon.online/api/heartbeat/xxxxxxxx
Ping the Heartbeat from Celery Beat
# myapp/tasks.py
import requests
from celery import shared_task
from django.conf import settings
@shared_task(name="myapp.ping_heartbeat")
def ping_heartbeat():
url = settings.VIGILMON_HEARTBEAT_URL
if url:
try:
requests.get(url, timeout=5)
except requests.RequestException:
pass # Don't let a monitoring failure crash the task
Schedule it in CELERY_BEAT_SCHEDULE:
# settings.py
from celery.schedules import crontab
CELERY_BEAT_SCHEDULE = {
"ping-heartbeat": {
"task": "myapp.ping_heartbeat",
"schedule": crontab(minute="*/5"), # every 5 minutes
},
# ... your existing tasks
}
VIGILMON_HEARTBEAT_URL = env("VIGILMON_HEARTBEAT_URL", default="")
Set VIGILMON_HEARTBEAT_URL in your .env:
VIGILMON_HEARTBEAT_URL=https://vigilmon.online/api/heartbeat/xxxxxxxx
Now, if your Celery workers die or beat stops scheduling, Vigilmon stops receiving pings and fires an alert after 10 minutes.
Step 4: Management Command Heartbeat Pattern
Many Django projects use manage.py commands for long-running jobs (data imports, report generation, nightly syncs). Here's the pattern for monitoring them:
# myapp/management/commands/import_data.py
import requests
from django.core.management.base import BaseCommand
from django.conf import settings
class Command(BaseCommand):
help = "Import data from upstream source"
def handle(self, *args, **options):
self.stdout.write("Starting import...")
try:
self._run_import()
self._ping_success()
self.stdout.write(self.style.SUCCESS("Import complete."))
except Exception as exc:
self.stderr.write(f"Import failed: {exc}")
raise # let the process exit non-zero
def _run_import(self):
# your actual import logic here
pass
def _ping_success(self):
url = getattr(settings, "VIGILMON_IMPORT_HEARTBEAT_URL", "")
if url:
try:
requests.get(url, timeout=5)
except requests.RequestException:
pass
Create a separate Vigilmon heartbeat monitor for each critical management command, then call its URL at the end of a successful run. If the command starts crashing or isn't being invoked by cron, Vigilmon catches the silence.
Step 5: Configure Alert Channels
In Vigilmon's Alerts settings:
Email Alerts
- Go to Alert Channels → Email
- Add
ops@yourdomain.com(or your personal email) - Attach the channel to both your HTTP monitor and heartbeat monitor
Slack Webhook
- In Slack, create an Incoming Webhook for your
#opsor#alertschannel - In Vigilmon, go to Alert Channels → Webhook
- Paste the Slack Webhook URL
- Set the payload template to:
{
"text": "🚨 *{{ monitor.name }}* is DOWN\nStatus: {{ event.status }}\nURL: {{ monitor.url }}\n<https://vigilmon.online|View in Vigilmon>"
}
Now you'll get an alert in Slack when the API goes down and a separate alert when the Celery heartbeat goes silent.
Step 6: Test It End-to-End
# 1. Verify your health endpoint locally
curl -s http://localhost:8000/api/health/ | python -m json.tool
# 2. Simulate a failure: stop your database and hit the endpoint
# Expected: 503 with {"status": "degraded", "checks": {"database": "error: ..."}}
# 3. Stop Celery beat and wait 10 minutes
# Expected: Vigilmon fires a heartbeat alert
# 4. Trigger a test alert from Vigilmon UI → confirm Slack/email delivery
What You Now Have
| Monitor | Catches |
|---|---|
HTTP /api/health/
|
Server crashes, bad deploys, DB/cache failures |
HTTP /api/
|
Routing failures, auth middleware breakage |
| Celery heartbeat | Dead workers, stalled queues, beat scheduler crashes |
| Management command heartbeat | Silent cron failures, import crashes |
Your team now gets alerted within minutes — not hours — when any of these fail.
Next Steps
- Add a public status page on Vigilmon and link it from your API docs
- Set up per-environment monitors (staging vs production) so you catch issues in staging before prod
- Use DRF's
throttle_classesto rate-limit the/api/health/endpoint if it's publicly accessible
Found this useful? Vigilmon is free to start — sign up here and have your first monitor live in under 5 minutes.
Top comments (0)