DEV Community

Cover image for I Built an API Monitoring Platform Because My Own API Went Down and I Had No Idea
vaibhav
vaibhav

Posted on

I Built an API Monitoring Platform Because My Own API Went Down and I Had No Idea

I Built an API Monitoring Platform Because My Own API Went Down and I Had No Idea

TL;DR: My deployed API went down silently. I found out hours later. UptimeRobot felt like overkill for a student project. So I built Monitorly — a real-time API monitoring platform with a live dashboard, email alerts (without spamming you), and uptime tracking. It's live and open source.


The Problem

I deployed my first production backend and felt great about it.

Then it went down. Silently. No alert, no notification, nothing. I found out hours later when I went to check it manually.

That's when I realised two things:

  1. Every deployed project needs uptime monitoring
  2. Existing tools like UptimeRobot felt like overkill — too many settings, too much noise, not built for someone just learning how production works

So I built my own. And in the process, I learned more about real-time systems, cron jobs, and backend architecture than any tutorial had taught me.


What is Monitorly?

Monitorly is a production-grade API uptime monitoring platform. You add your endpoints, it checks them on a schedule, shows you a live dashboard, and emails you when something goes down — without flooding your inbox.

Live: urlzap.me/pulsewatch
GitHub: github.com/vbv0507/api-monitoring-system


Features

1. Real-Time Dashboard with Socket.io

No page refresh needed. When a monitor check completes, the result pushes instantly to your dashboard via WebSocket. You see status changes the moment they happen.

2. Email Alerts — Without Spamming You

This was a deliberate product decision. Most monitoring tools email you on every failed check. If your API is flapping (going up and down repeatedly), you'd get 50 emails in an hour.

Monitorly only sends an alert when the status changes — down → alert sent. Back up → recovery email sent. One email per incident, not one per check.

3. Uptime Percentage Calculation

Every monitor tracks its full check history and calculates a rolling uptime percentage. You can see at a glance whether your API has been 99.9% up or quietly degraded over time.

4. Configurable Check Intervals

Checks run every 1, 5, or 15 minutes depending on how closely you need to watch an endpoint. The cron engine handles all of it automatically.

5. Per-User Monitor Isolation

Every user only sees their own monitors. JWT authentication ensures complete isolation — you can't accidentally see or affect someone else's data.


Tech Stack

Layer Technology
Runtime Node.js
Framework Express.js
Database MongoDB + Mongoose
Real-time Socket.io
Scheduling node-cron
HTTP checks Axios
Email alerts Nodemailer
Auth JWT
Deployment Azure App Service

How It Works — The Core Architecture

The Monitoring Engine

The heart of Monitorly is the cron-based monitoring engine. When a user adds a monitor, it registers a cron job that fires at the configured interval.

const cron = require('node-cron');
const axios = require('axios');

function scheduleMonitor(monitor) {
  const intervals = {
    1:  '* * * * *',      // every 1 minute
    5:  '*/5 * * * *',    // every 5 minutes
    15: '*/15 * * * *'    // every 15 minutes
  };

  const expression = intervals[monitor.interval] || '*/5 * * * *';

  cron.schedule(expression, async () => {
    await runCheck(monitor);
  });
}
Enter fullscreen mode Exit fullscreen mode

Running a Health Check

Each check measures response time, status code, and whether the endpoint is reachable at all.

async function runCheck(monitor) {
  const startTime = Date.now();
  let status = 'down';
  let responseTime = null;
  let statusCode = null;

  try {
    const response = await axios.get(monitor.url, { timeout: 10000 });
    responseTime = Date.now() - startTime;
    statusCode = response.status;
    status = statusCode >= 200 && statusCode < 400 ? 'up' : 'down';
  } catch (err) {
    responseTime = Date.now() - startTime;
    statusCode = err.response?.status || null;
  }

  // Save the log
  await MonitorLog.create({
    monitorId: monitor._id,
    status,
    responseTime,
    statusCode,
    checkedAt: new Date()
  });

  // Push real-time update to dashboard
  io.to(monitor.userId.toString()).emit('monitor-update', {
    monitorId: monitor._id,
    status,
    responseTime,
    statusCode
  });

  // Handle alert logic
  await handleAlerts(monitor, status);
}
Enter fullscreen mode Exit fullscreen mode

The Alert Logic — No Spam

This is the part I'm most proud of. Instead of alerting on every failed check, Monitorly tracks the previous status and only triggers an email when the state changes.

async function handleAlerts(monitor, newStatus) {
  const previousStatus = monitor.lastStatus;

  // Only act on status change
  if (newStatus === previousStatus) return;

  // Update stored status
  await Monitor.findByIdAndUpdate(monitor._id, { lastStatus: newStatus });

  if (newStatus === 'down') {
    // Send downtime alert
    await sendEmail({
      to: monitor.userId.email,
      subject: `🔴 ${monitor.name} is down`,
      body: `Your endpoint ${monitor.url} is not responding. Detected at ${new Date().toISOString()}.`
    });
  }

  if (newStatus === 'up' && previousStatus === 'down') {
    // Send recovery alert
    await sendEmail({
      to: monitor.userId.email,
      subject: `🟢 ${monitor.name} is back up`,
      body: `Your endpoint ${monitor.url} has recovered. Downtime ended at ${new Date().toISOString()}.`
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

One status change = one email. That's it.

Uptime Percentage Calculation

async function getUptimePercentage(monitorId, days = 7) {
  const since = new Date();
  since.setDate(since.getDate() - days);

  const logs = await MonitorLog.find({
    monitorId,
    checkedAt: { $gte: since }
  });

  if (logs.length === 0) return null;

  const upCount = logs.filter(log => log.status === 'up').length;
  return ((upCount / logs.length) * 100).toFixed(2);
}
Enter fullscreen mode Exit fullscreen mode

Simple, accurate, and runs off the existing log data with no extra storage.


Real-Time Dashboard with Socket.io

When the monitoring engine runs a check, it emits a monitor-update event to the user's socket room. On the frontend, the dashboard listens and updates the UI instantly.

// Server — emit to user's room
io.to(monitor.userId.toString()).emit('monitor-update', updatePayload);

// Client — listen and update UI
socket.on('monitor-update', (data) => {
  updateMonitorCard(data.monitorId, data.status, data.responseTime);
});
Enter fullscreen mode Exit fullscreen mode

No polling. No page refresh. The dashboard just stays live.


MongoDB Schema Design

// Monitor — what the user wants to track
const monitorSchema = new mongoose.Schema({
  userId:      { type: mongoose.Schema.Types.ObjectId, ref: 'User', required: true },
  name:        { type: String, required: true },
  url:         { type: String, required: true },
  interval:    { type: Number, enum: [1, 5, 15], default: 5 },
  lastStatus:  { type: String, enum: ['up', 'down', 'pending'], default: 'pending' },
  createdAt:   { type: Date, default: Date.now }
});

// Log — individual check result
const monitorLogSchema = new mongoose.Schema({
  monitorId:    { type: mongoose.Schema.Types.ObjectId, ref: 'Monitor', required: true },
  status:       { type: String, enum: ['up', 'down'], required: true },
  responseTime: { type: Number },
  statusCode:   { type: Number },
  checkedAt:    { type: Date, default: Date.now }
});

// Index for fast log queries per monitor
monitorLogSchema.index({ monitorId: 1, checkedAt: -1 });
Enter fullscreen mode Exit fullscreen mode

Separating monitors from logs is important — logs grow fast, and you don't want a single document ballooning with embedded arrays.


What I Learned Building This

1. Socket.io rooms are perfect for multi-user real-time apps.
Instead of broadcasting every update to every connected client, I put each user in their own room (socket.join(userId)). Clean isolation with one line of code.

2. Cron jobs need restart handling.
When the server restarts, all scheduled cron jobs are gone. I reload all active monitors from MongoDB on server startup and reschedule them. Always think about what happens on restart.

3. Alert logic is a UX problem, not just a technical one.
The "no spam" decision came from thinking about what I'd actually want as a user. Technical correctness (alert on every failure) is not the same as a good user experience. Think about both.

4. Separate your logs from your main documents.
Early on I stored check results as an embedded array inside the Monitor document. It hit MongoDB's 16MB document limit faster than expected. Separate collections with indexes is the right pattern for time-series data.

5. Timeouts are not optional.
Without a timeout on Axios requests, a hanging endpoint would block the check indefinitely. Always set a timeout on outbound HTTP calls in a monitoring system.


What's Next

  • [ ] SMS alerts via Twilio
  • [ ] Response body validation (not just status codes)
  • [ ] Public status pages per user
  • [ ] Webhook support for Slack / Discord notifications
  • [ ] Multi-region checks

Try It

Live: urlzap.me/pulsewatch
GitHub: github.com/vbv0507/api-monitoring-system

Add your own endpoints and watch the dashboard update in real time. If something breaks, you'll know immediately — not hours later like I did.

Questions about any part of the build? Drop them in the comments.


Also built: URLzap — a free URL shortener with custom aliases, because Bit.ly paywalls them. And yes, the short links in this post are from my own shortener.


Tags: #node #javascript #webdev #showdev

Top comments (0)