Draginox

Posted on Mar 11

How We Index 5000+ Minecraft Servers: Database Design and Search Architecture

#database #postgres #prisma #architecture

When you're building a platform that indexes thousands of game servers with real-time data, your database design makes or breaks you. A bad schema means slow queries, missing data, and frustrated users. A good one scales effortlessly.

Here's how I designed the database behind Minecraft ServerHub, a server discovery platform that tracks 5000+ servers with live player counts, uptime history, and community voting.

The Core Problem

A Minecraft server listing needs to:

Store server metadata (name, IP, port, version, description)
Track real-time status (online/offline, player count, latency)
Calculate uptime from historical pings
Support voting, reviews, and rankings
Handle search with filters (gamemode, version, country, player count)
Aggregate statistics across the entire ecosystem

All of this needs to feel fast — sub-200ms response times for listing pages.

Schema Design: The Server Model

The Server model is the core of everything. Here's the simplified version:

model Server {
  id             String       -- pk -- primary-key, cuid()
  name           String
  slug           String       -- unique
  ipAddress      String
  port           Int          -- default(25565)
  edition        String       -- default("java")

  // Status fields updated by ping system
  isOnline       Boolean      -- default(false)
  playersOnline  Int          -- default(0)
  maxPlayers     Int          -- default(0)
  version        String?
  software       String?
  uptime         Float        -- default(0)
  latency        Int?

  // Ranking signals
  votesThisMonth Int          -- default(0)
  totalClicks    Int          -- default(0)
  uniqueVisitors Int          -- default(0)

  // Lifecycle
  status         ServerStatus -- default(PENDING)
  tier           ServerTier   -- default(FREE)

  // Relations
  pingHistory    PingHistory[]
  reviews        Review[]
  votes          Vote[]
  tags           ServerTag[]

  @-- index([status, votesThisMonth])
  @-- index([status, country])
  @-- index([status, createdAt])
}

Why These Indexes?

The three composite indexes cover our most common query patterns:

[status, votesThisMonth] — The default server listing sorted by votes. This is the homepage query.
[status, country] — Country-specific pages like /servers/country/us. We have 30+ country landing pages.
[status, createdAt] — "Recently added" sorting.

Every query filters by status = ACTIVE first, which is why status is always the first field in the composite index.

The Dual-Table Strategy

We have two server tables: Server (user-submitted, with ownership) and SeedServer (pre-populated from discovery crawlers).

Why two tables instead of a source field on one table?

Different data models — Server has ownership, verification, premium tiers, votes. SeedServer is just raw data.
Query isolation — Listing queries only hit Server. Search hits both with UNION.
No contamination — Seed data never appears in rankings until a user claims and verifies it.

Search across both tables:

async function searchServers(query: string) {
  const [servers, seedServers] = await Promise.all([
    prisma.server.findMany({
      where: {
        status: "ACTIVE",
        OR: [
          { name: { contains: query, mode: "insensitive" } },
          { ipAddress: { contains: query } },
        ],
      },
      take: 20,
      orderBy: { votesThisMonth: "desc" },
    }),
    prisma.seedServer.findMany({
      where: {
        OR: [
          { name: { contains: query, mode: "insensitive" } },
          { ipAddress: { contains: query } },
        ],
      },
      take: 10,
    }),
  ]);

  return [...servers, ...seedServers.filter(
    (seed) => !servers.some((s) => s.ipAddress === seed.ipAddress)
  )];
}

Ping History: Time-Series Without TimescaleDB

We track every ping in a PingHistory table. With 5000 servers pinged every 5 minutes, that's ~1.4 million rows per day.

model PingHistory {
  id        String   -- pk -- primary-key, cuid()
  serverId  String
  isOnline  Boolean
  players   Int      -- default(0)
  latency   Int?
  createdAt DateTime -- auto-timestamp

  server    Server   -- relation(fields: [serverId], references: [id])

  @-- index([serverId, createdAt])
}

The composite index on [serverId, createdAt] is essential. Without it, calculating uptime for a single server would scan millions of rows.

For uptime, we use groupBy instead of loading all rows:

const pings = await prisma.pingHistory.groupBy({
  by: ["isOnline"],
  where: {
    serverId,
    createdAt: { gte: thirtyDaysAgo },
  },
  _count: true,
});

This returns just 2 rows (online count and offline count) regardless of how many pings exist. PostgreSQL handles the aggregation.

The Voting System: Preventing Abuse

Votes determine server ranking, so they're a target for manipulation:

model Vote {
  id                String   -- pk -- primary-key, cuid()
  serverId          String
  ipHash            String   // SHA256 of IP — no raw IPs stored
  minecraftUsername String?
  fingerprint       String?
  votedAt           DateTime -- auto-timestamp

  @-- unique([serverId, ipHash, votedAt])
  @-- index([serverId, votedAt])
}

Key design decisions:

IP hashing — We never store raw IP addresses. SHA256 hash is enough for deduplication.
Composite unique constraint — One vote per IP per server per day.
Fingerprinting — Browser fingerprint catches multi-account voting.
Votifier integration — Servers can verify votes and reward players.

Aggregated Statistics

The statistics page shows ecosystem-wide data. We pre-aggregate this instead of computing it live:

const [
  totalServers,
  onlineServers,
  totalPlayers,
  gamemodeDistribution,
  versionDistribution,
] = await Promise.all([
  prisma.server.count({ where: { status: "ACTIVE" } }),
  prisma.server.count({ where: { status: "ACTIVE", isOnline: true } }),
  prisma.server.aggregate({ where: { status: "ACTIVE" }, _sum: { playersOnline: true } }),
  prisma.serverTag.groupBy({ by: ["tagId"], _count: true }),
  prisma.server.groupBy({ by: ["version"], _count: true, where: { status: "ACTIVE" } }),
]);

With Redis caching at a 5-minute TTL, the statistics page loads in under 100ms.

Lessons Learned

1. Composite indexes are not optional at scale. Our most important optimization was adding @-- index([status, votesThisMonth]). Before that, the homepage query took 800ms. After: 12ms.

2. Two tables > one table with a type field. The Server/SeedServer split keeps queries clean and prevents accidental mixing of verified and unverified data.

3. Aggregate, don't scan. Using groupBy for uptime calculation instead of loading all ping records reduced memory usage by 100x.

4. Hash sensitive data at write time. Storing IP hashes instead of raw IPs means a database breach doesn't expose user locations.

5. Pre-compute expensive stats. The statistics page runs 18 queries, but they're all served from cache 99% of the time.

The Numbers

The full schema has 70+ models covering servers, users, analytics, payments, and moderation. At peak:

5000+ servers indexed
170,000+ players tracked
~1.4M ping records/day
< 200ms p95 response time
PostgreSQL handling it all on a single instance

You can see the live data on the Minecraft ServerHub statistics dashboard.

Building something similar? The key is starting with the right indexes. Profile your top 5 queries, add composite indexes for each, and cache aggressively. Your database will thank you.

DEV Community