Monitoring Your Phoenix LiveView App with Vigilmon
Elixir is famous for "let it crash." BEAM processes restart themselves, supervisors keep trees healthy, and your app heals from most failures without human intervention.
But some failures don't self-heal: the node is unreachable from the internet, the Postgres connection pool is exhausted, a LiveView deploy left a port misbound. These require external eyes — a monitoring service that hits your app from outside and tells you when it can't get through.
This tutorial adds production observability to a Phoenix app:
- A health check plug (zero-dependency, framework-idiomatic)
- HTTP uptime monitoring with Vigilmon
- Multi-region checks that benefit distributed Elixir deployments
- Heartbeat monitoring for
Obanjobs andGenServer-based workers - Slack alerts and a status page
Step 1: Add a health check plug
Phoenix doesn't need a library for a basic health check. A plug is idiomatic, fast, and has zero dependencies.
# lib/my_app_web/plugs/health_check.ex
defmodule MyAppWeb.Plugs.HealthCheck do
import Plug.Conn
def init(opts), do: opts
def call(%Plug.Conn{request_path: "/health"} = conn, _opts) do
checks = run_checks()
status = if Enum.all?(checks, fn {_, v} -> v == :ok end), do: 200, else: 503
conn
|> put_resp_content_type("application/json")
|> send_resp(status, Jason.encode!(%{status: status_label(status), checks: checks}))
|> halt()
end
def call(conn, _opts), do: conn
defp run_checks do
%{
database: check_database(),
memory: check_memory()
}
end
defp check_database do
case Ecto.Adapters.SQL.query(MyApp.Repo, "SELECT 1", []) do
{:ok, _} -> :ok
{:error, _} -> :error
end
end
defp check_memory do
# Alert if memory usage exceeds 90%
case :memsup.get_system_memory_data() do
[] ->
:ok
data ->
total = Keyword.get(data, :total_memory, 1)
free = Keyword.get(data, :free_memory, total)
used_pct = (total - free) / total * 100
if used_pct < 90, do: :ok, else: :error
end
end
defp status_label(200), do: "ok"
defp status_label(_), do: "degraded"
end
Plug it in before your router (so it bypasses authentication middleware):
# lib/my_app_web/endpoint.ex
defmodule MyAppWeb.Endpoint do
use Phoenix.Endpoint, otp_app: :my_app
plug MyAppWeb.Plugs.HealthCheck # ← add before the router
# ... rest of your plugs
plug MyAppWeb.Router
end
Test it:
mix phx.server
curl http://localhost:4000/health
# {"status":"ok","checks":{"database":"ok","memory":"ok"}}
A non-200 response body tells you exactly which check failed. That precision matters when you're triaging at 2 AM.
Optional: use the plug_checkup library
If you'd rather use a library with built-in checks for Ecto, Redis, and HTTP dependencies:
# mix.exs
{:plug_checkup, "~> 0.6"}
defmodule MyApp.Checks do
use PlugCheckup, checks: [
PlugCheckup.Check.new("db", MyApp.Checks.Database),
PlugCheckup.Check.new("redis", MyApp.Checks.Redis),
]
end
Either approach gives you a URL that returns 200 when healthy and 503 with details when not.
Step 2: Set up HTTP monitoring in Vigilmon
Point Vigilmon at your health endpoint:
- Sign up at vigilmon.online
- Click New Monitor → HTTP
- Enter
https://yourdomain.com/health - Set check interval: 1 minute (paid) or 5 minutes (free)
- Save
Vigilmon pings from multiple geographic regions. This is particularly valuable for Phoenix apps:
Why multi-region checks matter for Elixir:
Phoenix and LiveView apps often run as distributed clusters (libcluster, fly.io regions, Render multi-region). Multi-region monitoring catches split-brain scenarios where your app is reachable from one region but not another — which wouldn't show up in single-probe monitoring.
If you're deploying to Fly.io with multiple regions:
# Add monitors for each regional endpoint
https://app-name.fly.dev/health # primary
https://lhr.app-name.fly.dev/health # London
https://ord.app-name.fly.dev/health # Chicago
Each regional failure alerts independently, so you know whether a failure is local or global.
Step 3: Alerts via Slack
In Vigilmon, go to Notifications → New Channel → Slack and paste your Slack incoming webhook URL.
To create a webhook in Slack:
- api.slack.com/apps → Create New App → From scratch
- Enable Incoming Webhooks → Add New Webhook
- Pick your alerts channel and copy the URL
Enable the Slack channel on your monitor. When Phoenix is unreachable, Vigilmon sends:
🔴 DOWN: yourdomain.com/health
Status: 503 Service Unavailable
Detected from: EU-West, US-East
5 minutes ago
And when it recovers:
✅ RECOVERED: yourdomain.com/health
Downtime: 12 minutes
The recovery notification is often the most important one — it tells you when it's safe to stop firefighting.
Step 4: Heartbeat monitoring for Oban jobs and GenServers
LiveView handles its own process restarts. But scheduled Oban jobs and long-running GenServers can fail silently: the process stays up, the supervisor is happy, but work has stopped happening.
Heartbeat pattern: your job or GenServer pings a unique URL at the end of every successful execution cycle. If Vigilmon doesn't receive a ping within the expected window, it alerts you.
Oban job heartbeat
# lib/my_app/workers/daily_digest_worker.ex
defmodule MyApp.Workers.DailyDigestWorker do
use Oban.Worker, queue: :default
require Logger
@impl Oban.Worker
def perform(%Oban.Job{}) do
with :ok <- generate_digest(),
:ok <- send_digest() do
ping_heartbeat()
:ok
else
error ->
Logger.error("DailyDigestWorker failed: #{inspect(error)}")
{:error, error}
end
end
defp ping_heartbeat do
url = Application.get_env(:my_app, :vigilmon)[:digest_heartbeat_url]
if url do
case Req.get(url, receive_timeout: 5_000) do
{:ok, _} -> :ok
{:error, reason} -> Logger.warning("Heartbeat ping failed: #{inspect(reason)}")
end
end
end
defp generate_digest, do: :ok # your logic
defp send_digest, do: :ok # your logic
end
Add the config:
# config/runtime.exs
config :my_app, :vigilmon,
digest_heartbeat_url: System.get_env("VIGILMON_DIGEST_HEARTBEAT_URL")
GenServer heartbeat
For long-running GenServers (polling external APIs, syncing data), add a heartbeat on each successful tick:
# lib/my_app/sync_server.ex
defmodule MyApp.SyncServer do
use GenServer
require Logger
@interval :timer.minutes(5)
def start_link(_), do: GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
@impl true
def init(state) do
schedule_tick()
{:ok, state}
end
@impl true
def handle_info(:tick, state) do
case sync_data() do
:ok ->
ping_heartbeat()
{:error, reason} ->
Logger.error("Sync failed: #{inspect(reason)}")
# No ping → Vigilmon alerts after the window expires
end
schedule_tick()
{:noreply, state}
end
defp schedule_tick, do: Process.send_after(self(), :tick, @interval)
defp ping_heartbeat do
url = Application.get_env(:my_app, :vigilmon)[:sync_heartbeat_url]
if url, do: Req.get(url, receive_timeout: 5_000)
end
defp sync_data, do: :ok # your logic
end
In Vigilmon, create a Heartbeat Monitor for each critical worker:
- Click New Monitor → Heartbeat
- Set expected interval (e.g. 5 minutes for the sync worker, 24 hours for the digest)
- Copy the ping URL
- Set it as an env variable in your release config
Now if a worker crashes and its supervisor gives up retrying, you get an alert rather than a silent gap in your data.
Step 5: LiveView deployment health
LiveView uses long-poll WebSocket connections. After a deploy, existing clients reconnect to the new node — and if something goes wrong during that reconnect window, users see a broken interface.
Add a monitor specifically for your LiveView websocket endpoint:
https://yourdomain.com/live/websocket
Vigilmon's HTTP monitor will verify the endpoint responds. This catches port binding failures, SSL termination issues, and misconfigured nginx/Caddy upstreams after a deploy.
You can also use a keyword check: in Vigilmon's HTTP monitor, add a body keyword match for content that should appear on your homepage (like your app name or a page title). If the response is a 200 but the wrong page, the keyword check fails.
Step 6: Status page and badge
Status page:
- Go to Status Pages → New Status Page in Vigilmon
- Add your monitors
- Copy the public URL
Share it in your README, error pages, or in your Slack channel topic so the team can check it first when users report issues.
README badge:

As an HTML embed:
<a href="https://status.yourdomain.com">
<img src="https://vigilmon.online/badge/your-monitor-slug" alt="Uptime">
</a>
The badge shows live status and response time.
What you've built
| What | How |
|---|---|
| Health check endpoint | Custom HealthCheck plug, zero dependencies |
| DB + memory checks |
Ecto.Adapters.SQL.query/3, :memsup
|
| HTTP uptime monitoring | Vigilmon HTTP monitor → /health
|
| Multi-region coverage | Vigilmon multi-probe checks |
| Slack downtime alerts | Vigilmon Slack notification channel |
| Oban job monitoring | Heartbeat ping on perform/1 success |
| GenServer monitoring | Heartbeat ping on each successful tick |
| Status page | Vigilmon public status page |
| README badge |
/badge/{slug} SVG embed |
BEAM keeps your processes alive. Vigilmon keeps external eyes on the result.
Next steps
- Add
:memsupand:cpu_supdata to your health response for richer monitoring context - Use Vigilmon's response time history to catch slow Ecto queries before they cause timeouts
- Add separate heartbeat monitors for every Oban queue that processes business-critical jobs
- If you run multiple Fly.io regions, add a monitor per region to catch split-brain failures
Get started free at vigilmon.online.
Top comments (0)