DEV Community

Cover image for How I Built a Swedish Crossword Solver with Astro and 400,000+ Words
Evy Lundel
Evy Lundel

Posted on

How I Built a Swedish Crossword Solver with Astro and 400,000+ Words

How I Built a Swedish Crossword Solver with Astro and 400,000+ Words

I recently launched Korsordsakuten — a free Swedish crossword solver — and learned a lot about building SEO-driven content sites with Astro. Here's what I built and what I learned along the way.

What It Does

The site lets you:

  • Search 400,000+ Swedish word forms by clue, pattern, or length
  • Filter answers by letter count (e.g. "give me 6-letter synonyms only")
  • Browse prefix/suffix indexes (words starting with SK, ending with ERA etc.)
  • Solve anagrams
  • Play a daily Wordle-style word game

The Stack

  • Astro 6 (SSR mode, Node adapter) — perfect for content-heavy sites. Each route is server-rendered but the build is still fast.
  • Node.js 22 on Render (free tier)
  • GitHub Pages as a static sitemap mirror — more on this below
  • Zero client-side JS frameworks. Just vanilla JS where needed.

The Data Pipeline

The word database is built from public Swedish word lists, processed through a Node.js pipeline:

scripts/
  build-worddb.mjs      # Build words.json, synonyms.json, related.json
  build-phrases.mjs     # Process multi-word crossword entries
  build-sitemaps.mjs    # Generate 278 sitemap files × 1000 URLs each
  publish-sitemaps.mjs  # Push sitemaps to GitHub Pages mirror
Enter fullscreen mode Exit fullscreen mode

The synonym/related data is derived from Swedish lexical resources, giving each word a list of crossword-appropriate answers.

The Trickiest Part: Memory on a Free Tier

The site has some large JSON files:

  • clue-index.json — 18 MB
  • related.json — 8.5 MB
  • synonyms.json — 6.75 MB

Loading all of these at module startup caused OOM crashes on Render's 256 MB free tier. The fix: lazy loading via getter functions.

// Before — crashes on startup
import wordsData from '../data/words.json';

// After — only loads when first accessed
let _words: string[] | null = null;
export function getWords(): string[] {
  if (!_words) _words = wordsData as string[];
  return _words;
}
Enter fullscreen mode Exit fullscreen mode

Also added --max-old-space-size=460 to the start script (Render allows slightly over the nominal 256 MB before killing the process).

The Sitemap Problem

With 227,000+ URLs, Google wouldn't read our sitemaps. Two issues:

  1. Files were too large — 45,000 URLs × ~120 bytes = 5.4 MB per file. Google has an unofficial ~1 MB soft limit. Fixed by reducing to 1,000 URLs per file (278 files).

  2. The host went down — Render free tier sleeps on inactivity. When it's down, Google can't fetch the sitemap and backs off for weeks.

Solution: Push sitemaps to GitHub Pages as a static mirror. A simple script clones the repo, copies the XML files, rewrites the sitemap index URLs to point to the mirror, and pushes:

// publish-sitemaps.mjs (simplified)
const files = readdirSync(SRC_DIR).filter(f => /^sitemap.*\.xml$/.test(f));

for (const f of files) {
  if (f === 'sitemap.xml') {
    // Rewrite index so sub-sitemap URLs point to the mirror
    const content = readFileSync(join(SRC_DIR, f), 'utf-8');
    const rewritten = content.replace(
      /https:\/\/www\.korsordsakuten\.se\/(sitemap-\d+\.xml)/g,
      'https://sitemaps.korsordsakuten.se/$1'
    );
    writeFileSync(join(WORK_DIR, f), rewritten);
  } else {
    copyFileSync(join(SRC_DIR, f), join(WORK_DIR, f));
  }
}
Enter fullscreen mode Exit fullscreen mode

A custom subdomain (sitemaps.korsordsakuten.se → GitHub Pages via CNAME) makes the URLs clean. Now Google can always fetch sitemaps even if the main site is sleeping.

Structured Data That Actually Helps

For clue pages (/korsord/[word]), I added QAPage schema instead of just FAQPage:

{
  "@type": "QAPage",
  "mainEntity": {
    "@type": "Question",
    "name": "avslutningsvis — korsordssvar",
    "answerCount": 8,
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "SLUTLIGEN (9 bokstäver)",
      "upvoteCount": 8
    },
    "suggestedAnswer": []
  }
}
Enter fullscreen mode Exit fullscreen mode

This signals to Google that each clue page is structured Q&A content — similar to how Q&A forum sites are interpreted — rather than auto-generated thin content.

For word pages, DefinedTerm with synonyms as alternateName marks the page as a lexical resource:

{
  "@type": "DefinedTerm",
  "name": "PLATS",
  "alternateName": ["STÄLLE", "POSITION", "LÄGE"],
  "inDefinedTermSet": {
    "@type": "DefinedTermSet",
    "name": "Korsordsakuten ordlista"
  }
}
Enter fullscreen mode Exit fullscreen mode

Daily Fresh Content

One thing competitor sites have that purely static sites lack: freshness signals. Google re-crawls active sites more frequently.

Solution: /dagens-ledtradar — a page showing 24 curated crossword clues that rotates every day using a deterministic seed:

const dayNum = Math.floor((Date.now() - epoch) / 86400000);
const rng = mulberry32(dayNum * 7919 + 13);
// Pick 24 unique entries from top-2000 clues
Enter fullscreen mode Exit fullscreen mode

Same date = same picks (the page is cacheable), but crawling the URL the next day returns different content. Triggers Google's freshness heuristic without any database or cron job.

What I'd Do Differently

  • Start with a paid host. Render free tier sleeping + OOM issues cost weeks of SEO recovery time.
  • Plan for JSON size early. Lazy loading was the fix but the problem was predictable.
  • Submit 10 hand-picked URLs to GSC from day one. Don't wait for the sitemap crawler to discover everything.

Links

Happy to answer questions about any part of the stack!

Top comments (0)