José Catalá

Posted on Apr 14

Two Auth Problems: How MyAirports Handles Identity on Both Sides of the API

#node #security #api #javascript

Every API has an auth problem. Usually it's one: how do you verify that the person calling your API is allowed to?

Building MyAirports — a real-time flight data API covering 1,000+ airports — gave me two auth problems. The first is the standard one: verifying developers who use my API. The second is messier: authenticating against 1,000+ different airport APIs, each of which expects something different.

Here's how I solved both.

The outward-facing problem: clean developer auth

When a developer hits the MyAirports API, I want the experience to be familiar. You sign up, you get a key, you put it in a header, it works.

API key format

Keys follow this pattern:

ma_live_<32 random hex characters>

The ma_live_ prefix makes keys identifiable at a glance — in log files, in config dumps, in error messages. If you accidentally paste your key into a public repo, a secret scanning tool can spot it. This is the same pattern Stripe, Resend, and most modern API providers use, and for good reason.

Storage: never store the plaintext key

The key is generated once, shown to the user once, and then discarded. What gets stored in the database is the SHA-256 hash of the key:

// generation
const rawKey = `ma_live_${crypto.randomBytes(16).toString('hex')}`;
const hashedKey = crypto.createHash('sha256').update(rawKey).digest('hex');

// store only hashedKey in ApiKey table
// return rawKey to the user (once)

// lookup
const incomingHash = crypto.createHash('sha256').update(incomingKey).digest('hex');
const record = await prisma.apiKey.findFirst({ where: { hashedKey: incomingHash } });

This means a database breach doesn't expose live keys. The attacker gets hashes that are useless without the original plaintext.

Auth middleware: two paths into the system

The middleware checks in priority order:

// api/src/auth/middleware.js
export async function authMiddleware(req, res, next) {
  // Path 1: API key (developer access)
  const apiKey = req.headers['x-api-key'];
  if (apiKey) {
    const hashed = sha256(apiKey);
    const record = await prisma.apiKey.findFirst({ where: { hashedKey: hashed } });
    if (record) {
      req.user = await prisma.user.findUnique({ where: { id: record.userId } });
      return next();
    }
  }

  // Path 2: JWT cookie (dashboard/browser access)
  const token = req.cookies.accessToken;
  if (token) {
    try {
      const payload = jwt.verify(token, process.env.JWT_SECRET);
      req.user = payload;
      return next();
    } catch (_) {}
  }

  // Unauthenticated — public endpoints proceed, protected endpoints return 401
  next();
}

API keys handle programmatic access. JWTs handle the browser dashboard. The same middleware serves both, and there's no ambiguity about which wins — API key always takes precedence.

The JWT setup is standard: 1-hour access tokens, 30-day refresh tokens, both in httpOnly cookies. HS256, signed with JWT_SECRET. The short access token TTL limits blast radius if a token leaks; the refresh token means users don't re-authenticate constantly.

Rate limiting: in-memory counters, not DB hits

Rate limiting on every request is a performance concern. Hitting PostgreSQL on every API call to check a counter would add latency to every response. Instead:

// In-memory store — fast lookup
const counters = new Map(); // userId -> { count, date }

export function checkRateLimit(userId, plan) {
  const today = new Date().toISOString().split('T')[0];
  const entry = counters.get(userId) || { count: 0, date: today };

  if (entry.date !== today) {
    entry.count = 0;
    entry.date = today;
  }

  const limit = plan === 'pro' ? 1000 : 100;
  if (entry.count >= limit) {
    return { allowed: false, remaining: 0, limit };
  }

  entry.count++;
  counters.set(userId, entry);
  return { allowed: true, remaining: limit - entry.count, limit };
}

The in-memory counters are flushed to the ApiUsage PostgreSQL table every hour. This gives you persistence across restarts and a usage history for the dashboard, without a DB round-trip on every request.

Rate limit state is communicated back to callers via standard headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1712275200

And when the limit is exceeded:

HTTP 429 Too Many Requests
{
  "error": "Rate limit exceeded",
  "limit": 100,
  "remaining": 0,
  "resetAt": "2026-04-15T00:00:00Z"
}

The inward-facing problem: 1,000 different auth schemes

Now for the harder problem.

MyAirports scrapes flight data from airport websites to serve its API. Each airport website is different. And "different" doesn't just mean different data formats — it means different auth requirements too.

Here's a sample of what the scraper encounters:

No auth required — most airports. The flight data API is unauthenticated or only requires session cookies the browser already has.
Cloudflare JS challenge — the page won't load without executing a JavaScript fingerprinting challenge first. No challenge, no data.
Incapsula WAF — similar JS challenge, different vendor.
CSRF tokens — some airport APIs require a valid CSRF token extracted from the page before accepting data requests.
Leaked third-party API keys — during browser discovery, the scraper sometimes intercepts network requests that include API keys in headers. Airports that license flight data software sometimes include the vendor's API key in client-side requests.

The last one deserves its own Prisma model:

model LeakedKey {
  id          Int      @id @default(autoincrement())
  airportIata String
  keyName     String   // e.g. "X-API-Key", "Authorization"
  keyValue    String   // the actual leaked key value
  source      String   // URL it was found in
  tested      Boolean  @default(false)
  working     Boolean  @default(false)
  createdAt   DateTime @default(now())
}

When the interceptor finds an API key embedded in a request header during discovery, it logs it here. Some of these keys work for direct calls to the upstream flight data vendor — an interesting optimization, though using them raises its own questions.

WAF session persistence

Solving a Cloudflare or Incapsula challenge is expensive: it requires a full browser session, human-like behavior simulation, and sometimes a FlareSolverr sidecar to handle the cryptographic JS challenge:

docker run -d \
  --name flaresolverr \
  -p 8191:8191 \
  ghcr.io/flaresolverr/flaresolverr:latest

Paying that cost on every scrape is wasteful. Instead, solved cookies are stored in the Session table and reused:

model Session {
  id          Int      @id @default(autoincrement())
  airportIata String   @unique
  cookies     Json     // solved WAF cookies
  userAgent   String   // must match the browser that solved the challenge
  expiresAt   DateTime
  createdAt   DateTime @default(now())
}

The session store checks for a valid unexpired session before launching a browser. A session that's still valid skips the challenge entirely — the scraper just presents the saved cookies.

The anti-bot evasion stack

Beyond WAF cookies, the browser sessions themselves need to look human. The stealth layer manages six distinct browser profiles — each with a coherent set of navigator properties, WebGL fingerprints, HTTP headers, and behavioral patterns.

The profiles are designed to look like real returning users rather than fresh bot fingerprints. Persistent Chromium user-data directories reinforce this — the browser accumulates cookies and browsing history across scrapes, which is a signal many WAFs check.

Stale session eviction

Sessions expire. WAF cookies typically last 24-48 hours. The system evicts sessions as they expire and re-solves challenges when needed.

There's a parallel mechanism for discovered API endpoints: the ApiCache table tracks success and failure counts per endpoint. An endpoint that returns zero flights three times in a row gets flagged as stale:

model ApiCache {
  id           Int      @id @default(autoincrement())
  airportIata  String   @unique
  endpoint     String
  headers      Json
  successCount Int      @default(0)
  failCount    Int      @default(0)
  lastUsed     DateTime
  stale        Boolean  @default(false)
}

A stale endpoint triggers re-discovery — the browser fires up again, re-intercepts the network traffic, and hopefully finds the new API URL.

The architecture of two auth systems

Stepping back: the MyAirports API is a thin, clean auth layer sitting on top of a chaotic, unpredictable auth environment.

Outward-facing: developers see a consistent, well-documented interface. API keys in headers, rate limits in response headers, 429s when limits are hit. Predictable.

Inward-facing: the scraper deals with a different kind of problem every hour. No auth, WAF challenges, leaked keys, CSRF tokens, expired sessions. Unpredictable by design.

The key insight is that these two problems need to be separated architecturally. The developer-facing API layer doesn't know or care about WAF sessions — that complexity is fully contained in the scraping layer. The scraping layer doesn't know anything about ma_live_* keys — that's the API layer's concern.

The split keeps each layer simple. The outward-facing auth is straightforward standard API security. The inward-facing auth is a collection of adapters for a messy world.

The API is free to try at myairports.online/developers. Free tier: 100 requests/day, no card required.

DEV Community