Draginox

Posted on Mar 11

Building a Real-Time Minecraft Server Monitor with Next.js

#nextjs #gaming #webdev #typescript

Every day, millions of Minecraft players search for multiplayer servers. But finding one that's actually online, has players, and runs the right version? That's surprisingly hard.

I built Minecraft ServerHub to solve this. It pings 5000+ servers every 5 minutes and shows real-time player counts, uptime history, and version info. Here's how the monitoring system works under the hood.

The Architecture

The stack:

Next.js 14 (App Router) for the frontend and API routes
PostgreSQL with Prisma ORM for persistent storage
Redis for caching and rate limiting
minecraft-server-util for direct server pings

The key insight: instead of relying on third-party APIs (which have rate limits), we ping Minecraft servers directly using the Minecraft protocol.

Direct Server Pinging

Minecraft servers speak a well-defined protocol. When you connect to a server, the first thing that happens is a "status" handshake that returns the server's MOTD, player count, version, and favicon.

We use the minecraft-server-util library to handle this:

import { status, statusBedrock } from "minecraft-server-util";

async function pingServer(ip: string, port: number, edition: "java" | "bedrock") {
  const timeout = 5000; // 5 second timeout

  if (edition === "bedrock") {
    const response = await statusBedrock(ip, port, { timeout });
    return {
      online: true,
      players: response.players.online,
      maxPlayers: response.players.max,
      version: response.version.name,
      motd: response.motd.clean,
    };
  }

  const response = await status(ip, port, { timeout });
  return {
    online: true,
    players: response.players.online,
    maxPlayers: response.players.max,
    version: response.version.name,
    motd: response.motd.clean,
    favicon: response.favicon ?? null,
  };
}

This gives us zero API rate limits. The only constraint is network throughput.

Batch Processing at Scale

Pinging 5000 servers sequentially would take forever. So we batch them with controlled concurrency:

const BATCH_SIZE = 400;
const CONCURRENT_PINGS = 15;

async function processBatch(servers: Server[]) {
  const results = [];

  for (let i = 0; i < servers.length; i += CONCURRENT_PINGS) {
    const chunk = servers.slice(i, i + CONCURRENT_PINGS);
    const chunkResults = await Promise.allSettled(
      chunk.map((server) =>
        pingServer(server.ipAddress, server.port, server.edition)
      )
    );
    results.push(...chunkResults);
  }

  return results;
}

Why Promise.allSettled instead of Promise.all? Because servers go offline. We don't want one timeout to kill the entire batch.

With 15 concurrent connections and a 5-second timeout, worst case we process 15 servers every 5 seconds. That's 180 servers per minute — enough to cover our entire database in under 30 minutes, well within our 5-minute ping cycle when spread across cron invocations.

Multi-Layer Caching with Redis

Every ping result is cached to avoid hitting the database on every page load:

const CACHE_TTLS = {
  SERVER_LIST: 300,      // 5 minutes
  SERVER_DETAIL: 300,    // 5 minutes
  SERVER_PING: 360,      // 6 minutes (slightly longer than ping interval)
  TAGS: 3600,            // 1 hour
  GLOBAL_STATS: 86400,   // 1 day
};

async function getCachedServer(id: string) {
  const cacheKey = `server:${id}`;

  // Try Redis first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  // Fall back to database
  const server = await prisma.server.findUnique({ where: { id } });

  // Cache for next request
  if (server) {
    await redis.setex(cacheKey, CACHE_TTLS.SERVER_DETAIL, JSON.stringify(server));
  }

  return server;
}

We also have an in-memory fallback cache for when Redis is unavailable. It uses a stale-while-revalidate pattern with 2x TTL:

class MemoryCache {
  private cache = new Map<string, { data: any; expires: number; stale: number }>();

  get(key: string) {
    const entry = this.cache.get(key);
    if (!entry) return null;

    if (Date.now() < entry.expires) return entry.data; // Fresh
    if (Date.now() < entry.stale) return entry.data;   // Stale but usable

    this.cache.delete(key);
    return null;
  }
}

Storing Ping History for Uptime Calculation

Every ping creates a PingHistory record:

model PingHistory {
  id        String   @id @default(cuid())
  serverId  String
  isOnline  Boolean
  players   Int      @default(0)
  latency   Int?     // round-trip ms
  createdAt DateTime @default(now())

  server    Server   @relation(fields: [serverId], references: [id])

  @@index([serverId, createdAt]) // Composite index for uptime queries
}

Uptime is calculated from a 30-day rolling window:

async function calculateUptime(serverId: string): Promise<number> {
  const thirtyDaysAgo = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000);

  const pings = await prisma.pingHistory.groupBy({
    by: ["isOnline"],
    where: {
      serverId,
      createdAt: { gte: thirtyDaysAgo },
    },
    _count: true,
  });

  const online = pings.find((p) => p.isOnline)?._count ?? 0;
  const total = pings.reduce((sum, p) => sum + p._count, 0);

  return total > 0 ? Math.round((online / total) * 10000) / 100 : 0;
}

The composite index on [serverId, createdAt] is critical — without it, this query would do a full table scan on millions of rows.

The Cron Setup

We use Next.js API routes as cron endpoints, authenticated with a secret:

// /api/cron/ping-servers/route.ts
export async function GET(request: NextRequest) {
  const authHeader = request.headers.get("authorization");
  if (authHeader !== `Bearer ${process.env.CRON_SECRET}`) {
    return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
  }

  const servers = await getNextBatch(); // Gets servers not pinged recently
  const results = await processBatch(servers);
  await savePingResults(results);

  return NextResponse.json({
    processed: results.length,
    online: results.filter(r => r.status === "fulfilled").length,
  });
}

An external cron service (we use cron-job.org) calls this endpoint every 5 minutes. The batch system ensures each invocation processes a different set of servers.

Embeddable Status Badges

One feature that turned out to be really popular: embeddable SVG badges that server owners put on their websites.

https://minecraft-serverhub.com/api/badge/play.example.com?style=rounded&players=true

This returns a dynamic SVG that shows online/offline status and player count. The SVG is regenerated on each request with a 2-minute cache. Server owners embed it on their websites, forums, and Discord — and each badge links back to the server's page on ServerHub.

Results

After 3 months of running this system:

5000+ servers monitored across Java and Bedrock editions
170,000+ players tracked simultaneously at peak
99.7% uptime on the monitoring system itself
< 200ms average response time thanks to Redis caching

The full platform is live at minecraft-serverhub.com with a live statistics dashboard showing real-time ecosystem data.

Key Takeaways

Direct protocol access beats APIs — no rate limits, faster, more reliable
Batch with concurrency control — Promise.allSettled with chunking handles failures gracefully
Multi-layer caching is essential — Redis + in-memory prevents cascade failures
Composite indexes matter — a missing index on ping history turned a 2-second query into 200ms
SVG badges are link magnets — server owners embed them everywhere

If you're building any kind of monitoring system, the pattern of "direct protocol ping → batch process → cache aggressively → serve from cache" scales surprisingly well with minimal infrastructure.

Check out the server badge generator if you run a Minecraft server — it's free and takes 30 seconds to set up.

DEV Community