Evy Lundell

Posted on May 9

How I Built a Swedish Crossword Solver with Astro and 400,000+ Words

#javascript #showdev #sideprojects #webdev

How I Built a Swedish Crossword Solver with Astro and 400,000+ Words

I recently launched Korsordsakuten — a free Swedish crossword solver — and learned a lot about building SEO-driven content sites with Astro. Here's what I built and what I learned along the way.

What It Does

The site lets you:

Search 400,000+ Swedish word forms by clue, pattern, or length
Filter answers by letter count (e.g. "give me 6-letter synonyms only")
Browse prefix/suffix indexes (words starting with SK, ending with ERA etc.)
Solve anagrams
Play a daily Wordle-style word game

The Stack

Astro 6 (SSR mode, Node adapter) — perfect for content-heavy sites. Each route is server-rendered but the build is still fast.
Node.js 22 on Render (free tier)
GitHub Pages as a static sitemap mirror — more on this below
Zero client-side JS frameworks. Just vanilla JS where needed.

The Data Pipeline

The word database is built from public Swedish word lists, processed through a Node.js pipeline:

scripts/
  build-worddb.mjs      # Build words.json, synonyms.json, related.json
  build-phrases.mjs     # Process multi-word crossword entries
  build-sitemaps.mjs    # Generate 278 sitemap files × 1000 URLs each
  publish-sitemaps.mjs  # Push sitemaps to GitHub Pages mirror

The synonym/related data is derived from Swedish lexical resources, giving each word a list of crossword-appropriate answers.

The Trickiest Part: Memory on a Free Tier

The site has some large JSON files:

clue-index.json — 18 MB
related.json — 8.5 MB
synonyms.json — 6.75 MB

Loading all of these at module startup caused OOM crashes on Render's 256 MB free tier. The fix: lazy loading via getter functions.

// Before — crashes on startup
import wordsData from '../data/words.json';

// After — only loads when first accessed
let _words: string[] | null = null;
export function getWords(): string[] {
  if (!_words) _words = wordsData as string[];
  return _words;
}

Also added --max-old-space-size=460 to the start script (Render allows slightly over the nominal 256 MB before killing the process).

The Sitemap Problem

With 227,000+ URLs, Google wouldn't read our sitemaps. Two issues:

Files were too large — 45,000 URLs × ~120 bytes = 5.4 MB per file. Google has an unofficial ~1 MB soft limit. Fixed by reducing to 1,000 URLs per file (278 files).
The host went down — Render free tier sleeps on inactivity. When it's down, Google can't fetch the sitemap and backs off for weeks.

Solution: Push sitemaps to GitHub Pages as a static mirror. A simple script clones the repo, copies the XML files, rewrites the sitemap index URLs to point to the mirror, and pushes:

// publish-sitemaps.mjs (simplified)
const files = readdirSync(SRC_DIR).filter(f => /^sitemap.*\.xml$/.test(f));

for (const f of files) {
  if (f === 'sitemap.xml') {
    // Rewrite index so sub-sitemap URLs point to the mirror
    const content = readFileSync(join(SRC_DIR, f), 'utf-8');
    const rewritten = content.replace(
      /https:\/\/www\.korsordsakuten\.se\/(sitemap-\d+\.xml)/g,
      'https://sitemaps.korsordsakuten.se/$1'
    );
    writeFileSync(join(WORK_DIR, f), rewritten);
  } else {
    copyFileSync(join(SRC_DIR, f), join(WORK_DIR, f));
  }
}

A custom subdomain (sitemaps.korsordsakuten.se → GitHub Pages via CNAME) makes the URLs clean. Now Google can always fetch sitemaps even if the main site is sleeping.

Structured Data That Actually Helps

For clue pages (/korsord/[word]), I added QAPage schema instead of just FAQPage:

{
  "@type": "QAPage",
  "mainEntity": {
    "@type": "Question",
    "name": "avslutningsvis — korsordssvar",
    "answerCount": 8,
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "SLUTLIGEN (9 bokstäver)",
      "upvoteCount": 8
    },
    "suggestedAnswer": []
  }
}

This signals to Google that each clue page is structured Q&A content — similar to how Q&A forum sites are interpreted — rather than auto-generated thin content.

For word pages, DefinedTerm with synonyms as alternateName marks the page as a lexical resource:

{
  "@type": "DefinedTerm",
  "name": "PLATS",
  "alternateName": ["STÄLLE", "POSITION", "LÄGE"],
  "inDefinedTermSet": {
    "@type": "DefinedTermSet",
    "name": "Korsordsakuten ordlista"
  }
}

Daily Fresh Content

One thing competitor sites have that purely static sites lack: freshness signals. Google re-crawls active sites more frequently.

Solution: /dagens-ledtradar — a page showing 24 curated crossword clues that rotates every day using a deterministic seed:

const dayNum = Math.floor((Date.now() - epoch) / 86400000);
const rng = mulberry32(dayNum * 7919 + 13);
// Pick 24 unique entries from top-2000 clues

Same date = same picks (the page is cacheable), but crawling the URL the next day returns different content. Triggers Google's freshness heuristic without any database or cron job.

What I'd Do Differently

Start with a paid host. Render free tier sleeping + OOM issues cost weeks of SEO recovery time.
Plan for JSON size early. Lazy loading was the fix but the problem was predictable.
Submit 10 hand-picked URLs to GSC from day one. Don't wait for the sitemap crawler to discover everything.

DEV Community

How I Built a Swedish Crossword Solver with Astro and 400,000+ Words

How I Built a Swedish Crossword Solver with Astro and 400,000+ Words

What It Does

The Stack

The Data Pipeline

The Trickiest Part: Memory on a Free Tier

The Sitemap Problem

Structured Data That Actually Helps

Daily Fresh Content

What I'd Do Differently

Links

Top comments (0)