DEV Community

Abdul Rehman
Abdul Rehman

Posted on

I Built an AI Job Board Processing 10K+ Listings Daily Here's the Real Architecture

The problem wasn't building it. It was keeping the costs from eating us alive.

A job board that scores every listing with GPT-4 is a straightforward technical problem. You fetch listings, pipe them through an LLM, filter by relevance, serve the results. Easy.

But when you're doing that for 10,000+ fresh listings every single day, the naive approach will burn through your OpenAI budget in a week. That's a real constraint I had to design around for a client's production platform.

Here's the architecture that solved it. Not theoretical. Running in production right now.

The ingestion pipeline stops being simple at scale

The first design mistake I see people make is treating all sources the same. I built this system for a client that pulls from multiple ATS platforms. Each source has its own rate limits, data shape, and reliability profile.

The raw pipe looks like this:

// Simplified version of the per-source fetcher abstraction
async function ingestFromSource(source: SourceConfig) {
  const listings = [];
  let cursor = source.initialCursor;

  while (cursor) {
    const { data, nextCursor } = await fetchPage(
      source.apiEndpoint, 
      cursor, 
      source.rateLimit // unique throttle per source
    );

    const normalized = normalizeListings(data, source.schema);
    listings.push(...normalized);
    cursor = nextCursor;

    await sleep(source.rateLimit.delayMs); // no hammering
  }

  return deduplicate(listings);
}
Enter fullscreen mode Exit fullscreen mode

The key insight: every source needs its own cursor tracking, deduplication strategy, and failure handling. If one ATS API goes down, the others keep running. The system logs the failure, retries twice, then flags it for manual review.

One common approach is to pull everything first, then deduplicate. At 10K+ listings daily, that's wasteful. Deduplication needs to happen at ingestion time, before any LLM scoring, so you're not paying to score the same job multiple times.

GPT-4 function calling for scoring was the right call. But only with tiering.

The core value of the platform is relevance scoring. Every listing gets evaluated against candidate profiles using GPT-4 function calling. But I don't score every listing the same way.

Here's the tiering strategy that kept LLM costs predictable:

// The tier router decides which model scores each listing
function selectScoringTier(listing: Listing): 'full' | 'fast' | 'skip' {
  // Tier 1: Full GPT-4 scoring for high-value categories
  if (listing.category === 'software-engineering' || 
      listing.salary > 150000 || 
      listing.companyTier === 'top-100') {
    return 'full';
  }

  // Tier 2: GPT-4o-mini for everything else that needs scoring
  if (listing.lastListed < 7 days ago && 
      listing.location !== 'remote-anywhere') {
    return 'fast';
  }

  // Tier 3: No scoring, just keyword match
  return 'skip';
}
Enter fullscreen mode Exit fullscreen mode

Full scoring uses GPT-4 with a structured function call that returns a relevance score, matched skills, and a confidence level. Fast scoring uses GPT-4o-mini with a simpler prompt. Skip listings just get a keyword-based relevance filter.

The savings are real. A full GPT-4 scoring call costs significantly more per listing than the mini model. When you route the bulk of your volume to the fast tier, the difference adds up fast. The tiering logic pays for itself in the first day.

The Batch API is the hidden superpower for cost control

This was the single biggest optimization. OpenAI's Batch API gives you 50% off the standard rates. The tradeoff is latency: results come back in up to 24 hours instead of seconds.

For a job board, that tradeoff is almost always acceptable. New listings don't need to be scored instantly. They need to be scored before the user searches for them.

// Queue files for batch processing
async function prepareBatchScoringQueue(listings: Listing[]) {
  const batchFile = listings.map((l) => ({
    custom_id: l.id,
    method: 'POST',
    url: '/v1/chat/completions',
    body: {
      model: 'gpt-4o-mini',
      messages: buildScoringPrompt(l),
      response_format: { type: 'json_object' }
    }
  }));

  const file = await openai.files.create({
    file: new File([JSON.stringify(batchFile)], 'batch.jsonl'),
    purpose: 'batch'
  });

  const batch = await openai.batches.create({
    input_file_id: file.id,
    endpoint: '/v1/chat/completions',
    completion_window: '24h'
  });

  return batch.id; // Poll this later
}
Enter fullscreen mode Exit fullscreen mode

At 10K listings per day, suppose a portion go through full GPT-4 scoring and the rest through fast scoring. Using Batch API for the fast tier cuts that cost in half. Over a month, the savings are significant.

The pattern is simple: anything that doesn't need real-time responses goes through Batch API. Search queries, user prompts, real-time chat all stay on standard endpoints. The scoring pipeline that runs in the background? Batch all the way.

Caching isn't just for performance. It's for cost.

I learned this the hard way. If you rescore every listing every time new data comes in, you'll waste calls on identical job postings from different sources. The fix was a deduplication cache that stores the hash of each listing's core attributes plus the last scoring result:

CREATE TABLE listing_scores (
  listing_hash VARCHAR(64) PRIMARY KEY,
  source_url TEXT,
  score JSONB,
  model_used VARCHAR(32),
  scored_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_scores_recency ON listing_scores(scored_at);
Enter fullscreen mode Exit fullscreen mode

Before scoring any listing, the pipeline checks if an identical listing was scored recently. If yes, reuse the score. If the listing content changed or enough time passed, rescore.

At scale, the cache hit rate is meaningful. You're avoiding many API calls each day. At full GPT-4 pricing, that adds up fast.

The API-first design saved the project from itself

This wasn't a choice. The client needed the platform to serve both the web UI and external integrations. So I built the REST API layer first, then built the frontend on top of it.

// The listings endpoint serves web and API consumers identically
export async function GET(request: Request) {
  const { searchParams } = new URL(request.url);
  const query = searchParams.get('q');
  const location = searchParams.get('location');
  const cursor = searchParams.get('cursor');

  // Reuse the same service layer for both web and API calls
  const results = await listingsService.search({
    query,
    location,
    cursor: cursor || undefined,
    limit: 25
  });

  return NextResponse.json(results);
}
Enter fullscreen mode Exit fullscreen mode

This forced me to handle pagination, filtering, and error states properly from day one. The web frontend just consumed the same endpoints. When the client later needed to expose data to external partners, the API was already there. No rewrite needed.

The cursor-based pagination I mentioned earlier in the ingress pipeline applies here too. Deep skip-based pagination on the serving side created the same database CPU spikes we fought on the ingestion side. Cursor-based pagination on the outbound API resolved that cleanly.

The one thing I'd change if I built this again

The evaluation of cheaper models. Right now the pipeline is blocked from running AI-powered job description rewrites because GPT-4 class models make the cost prohibitive at 1M+ listing scale. I'm evaluating DeepSeek V4 Flash as a much cheaper alternative that might make the economics work.

If your team is running a similar pipeline and hitting cost walls on LLM scoring, that's exactly the kind of thing I help with. Happy to compare notes on model tiering strategies or Batch API patterns that keep your costs predictable at scale.


Written by Abdul Rehman, full-stack AI engineer building production SaaS, MVPs, and AI automation. More at PrimeStrides.

Top comments (0)