DEV Community

Draginox
Draginox

Posted on

How We Index 5000+ Minecraft Servers: Database Design and Search Architecture

When you're building a platform that indexes thousands of game servers with real-time data, your database design makes or breaks you. A bad schema means slow queries, missing data, and frustrated users. A good one scales effortlessly.

Here's how I designed the database behind Minecraft ServerHub, a server discovery platform that tracks 5000+ servers with live player counts, uptime history, and community voting.

The Core Problem

A Minecraft server listing needs to:

  1. Store server metadata (name, IP, port, version, description)
  2. Track real-time status (online/offline, player count, latency)
  3. Calculate uptime from historical pings
  4. Support voting, reviews, and rankings
  5. Handle search with filters (gamemode, version, country, player count)
  6. Aggregate statistics across the entire ecosystem

All of this needs to feel fast — sub-200ms response times for listing pages.

Schema Design: The Server Model

The Server model is the core of everything. Here's the simplified version:

model Server {
  id             String       -- pk -- primary-key, cuid()
  name           String
  slug           String       -- unique
  ipAddress      String
  port           Int          -- default(25565)
  edition        String       -- default("java")

  // Status fields updated by ping system
  isOnline       Boolean      -- default(false)
  playersOnline  Int          -- default(0)
  maxPlayers     Int          -- default(0)
  version        String?
  software       String?
  uptime         Float        -- default(0)
  latency        Int?

  // Ranking signals
  votesThisMonth Int          -- default(0)
  totalClicks    Int          -- default(0)
  uniqueVisitors Int          -- default(0)

  // Lifecycle
  status         ServerStatus -- default(PENDING)
  tier           ServerTier   -- default(FREE)

  // Relations
  pingHistory    PingHistory[]
  reviews        Review[]
  votes          Vote[]
  tags           ServerTag[]

  @-- index([status, votesThisMonth])
  @-- index([status, country])
  @-- index([status, createdAt])
}
Enter fullscreen mode Exit fullscreen mode

Why These Indexes?

The three composite indexes cover our most common query patterns:

  • [status, votesThisMonth] — The default server listing sorted by votes. This is the homepage query.
  • [status, country] — Country-specific pages like /servers/country/us. We have 30+ country landing pages.
  • [status, createdAt] — "Recently added" sorting.

Every query filters by status = ACTIVE first, which is why status is always the first field in the composite index.

The Dual-Table Strategy

We have two server tables: Server (user-submitted, with ownership) and SeedServer (pre-populated from discovery crawlers).

Why two tables instead of a source field on one table?

  1. Different data modelsServer has ownership, verification, premium tiers, votes. SeedServer is just raw data.
  2. Query isolation — Listing queries only hit Server. Search hits both with UNION.
  3. No contamination — Seed data never appears in rankings until a user claims and verifies it.

Search across both tables:

async function searchServers(query: string) {
  const [servers, seedServers] = await Promise.all([
    prisma.server.findMany({
      where: {
        status: "ACTIVE",
        OR: [
          { name: { contains: query, mode: "insensitive" } },
          { ipAddress: { contains: query } },
        ],
      },
      take: 20,
      orderBy: { votesThisMonth: "desc" },
    }),
    prisma.seedServer.findMany({
      where: {
        OR: [
          { name: { contains: query, mode: "insensitive" } },
          { ipAddress: { contains: query } },
        ],
      },
      take: 10,
    }),
  ]);

  return [...servers, ...seedServers.filter(
    (seed) => !servers.some((s) => s.ipAddress === seed.ipAddress)
  )];
}
Enter fullscreen mode Exit fullscreen mode

Ping History: Time-Series Without TimescaleDB

We track every ping in a PingHistory table. With 5000 servers pinged every 5 minutes, that's ~1.4 million rows per day.

model PingHistory {
  id        String   -- pk -- primary-key, cuid()
  serverId  String
  isOnline  Boolean
  players   Int      -- default(0)
  latency   Int?
  createdAt DateTime -- auto-timestamp

  server    Server   -- relation(fields: [serverId], references: [id])

  @-- index([serverId, createdAt])
}
Enter fullscreen mode Exit fullscreen mode

The composite index on [serverId, createdAt] is essential. Without it, calculating uptime for a single server would scan millions of rows.

For uptime, we use groupBy instead of loading all rows:

const pings = await prisma.pingHistory.groupBy({
  by: ["isOnline"],
  where: {
    serverId,
    createdAt: { gte: thirtyDaysAgo },
  },
  _count: true,
});
Enter fullscreen mode Exit fullscreen mode

This returns just 2 rows (online count and offline count) regardless of how many pings exist. PostgreSQL handles the aggregation.

The Voting System: Preventing Abuse

Votes determine server ranking, so they're a target for manipulation:

model Vote {
  id                String   -- pk -- primary-key, cuid()
  serverId          String
  ipHash            String   // SHA256 of IP — no raw IPs stored
  minecraftUsername String?
  fingerprint       String?
  votedAt           DateTime -- auto-timestamp

  @-- unique([serverId, ipHash, votedAt])
  @-- index([serverId, votedAt])
}
Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  1. IP hashing — We never store raw IP addresses. SHA256 hash is enough for deduplication.
  2. Composite unique constraint — One vote per IP per server per day.
  3. Fingerprinting — Browser fingerprint catches multi-account voting.
  4. Votifier integration — Servers can verify votes and reward players.

Aggregated Statistics

The statistics page shows ecosystem-wide data. We pre-aggregate this instead of computing it live:

const [
  totalServers,
  onlineServers,
  totalPlayers,
  gamemodeDistribution,
  versionDistribution,
] = await Promise.all([
  prisma.server.count({ where: { status: "ACTIVE" } }),
  prisma.server.count({ where: { status: "ACTIVE", isOnline: true } }),
  prisma.server.aggregate({ where: { status: "ACTIVE" }, _sum: { playersOnline: true } }),
  prisma.serverTag.groupBy({ by: ["tagId"], _count: true }),
  prisma.server.groupBy({ by: ["version"], _count: true, where: { status: "ACTIVE" } }),
]);
Enter fullscreen mode Exit fullscreen mode

With Redis caching at a 5-minute TTL, the statistics page loads in under 100ms.

Lessons Learned

1. Composite indexes are not optional at scale. Our most important optimization was adding @-- index([status, votesThisMonth]). Before that, the homepage query took 800ms. After: 12ms.

2. Two tables > one table with a type field. The Server/SeedServer split keeps queries clean and prevents accidental mixing of verified and unverified data.

3. Aggregate, don't scan. Using groupBy for uptime calculation instead of loading all ping records reduced memory usage by 100x.

4. Hash sensitive data at write time. Storing IP hashes instead of raw IPs means a database breach doesn't expose user locations.

5. Pre-compute expensive stats. The statistics page runs 18 queries, but they're all served from cache 99% of the time.

The Numbers

The full schema has 70+ models covering servers, users, analytics, payments, and moderation. At peak:

  • 5000+ servers indexed
  • 170,000+ players tracked
  • ~1.4M ping records/day
  • < 200ms p95 response time
  • PostgreSQL handling it all on a single instance

You can see the live data on the Minecraft ServerHub statistics dashboard.


Building something similar? The key is starting with the right indexes. Profile your top 5 queries, add composite indexes for each, and cache aggressively. Your database will thank you.

Top comments (0)