How I Built a Swedish Crossword Solver with Astro and 400,000+ Words
I recently launched Korsordsakuten — a free Swedish crossword solver — and learned a lot about building SEO-driven content sites with Astro. Here's what I built and what I learned along the way.
What It Does
The site lets you:
- Search 400,000+ Swedish word forms by clue, pattern, or length
- Filter answers by letter count (e.g. "give me 6-letter synonyms only")
- Browse prefix/suffix indexes (words starting with SK, ending with ERA etc.)
- Solve anagrams
- Play a daily Wordle-style word game
The Stack
- Astro 6 (SSR mode, Node adapter) — perfect for content-heavy sites. Each route is server-rendered but the build is still fast.
- Node.js 22 on Render (free tier)
- GitHub Pages as a static sitemap mirror — more on this below
- Zero client-side JS frameworks. Just vanilla JS where needed.
The Data Pipeline
The word database is built from public Swedish word lists, processed through a Node.js pipeline:
scripts/
build-worddb.mjs # Build words.json, synonyms.json, related.json
build-phrases.mjs # Process multi-word crossword entries
build-sitemaps.mjs # Generate 278 sitemap files × 1000 URLs each
publish-sitemaps.mjs # Push sitemaps to GitHub Pages mirror
The synonym/related data is derived from Swedish lexical resources, giving each word a list of crossword-appropriate answers.
The Trickiest Part: Memory on a Free Tier
The site has some large JSON files:
-
clue-index.json— 18 MB -
related.json— 8.5 MB -
synonyms.json— 6.75 MB
Loading all of these at module startup caused OOM crashes on Render's 256 MB free tier. The fix: lazy loading via getter functions.
// Before — crashes on startup
import wordsData from '../data/words.json';
// After — only loads when first accessed
let _words: string[] | null = null;
export function getWords(): string[] {
if (!_words) _words = wordsData as string[];
return _words;
}
Also added --max-old-space-size=460 to the start script (Render allows slightly over the nominal 256 MB before killing the process).
The Sitemap Problem
With 227,000+ URLs, Google wouldn't read our sitemaps. Two issues:
Files were too large — 45,000 URLs × ~120 bytes = 5.4 MB per file. Google has an unofficial ~1 MB soft limit. Fixed by reducing to 1,000 URLs per file (278 files).
The host went down — Render free tier sleeps on inactivity. When it's down, Google can't fetch the sitemap and backs off for weeks.
Solution: Push sitemaps to GitHub Pages as a static mirror. A simple script clones the repo, copies the XML files, rewrites the sitemap index URLs to point to the mirror, and pushes:
// publish-sitemaps.mjs (simplified)
const files = readdirSync(SRC_DIR).filter(f => /^sitemap.*\.xml$/.test(f));
for (const f of files) {
if (f === 'sitemap.xml') {
// Rewrite index so sub-sitemap URLs point to the mirror
const content = readFileSync(join(SRC_DIR, f), 'utf-8');
const rewritten = content.replace(
/https:\/\/www\.korsordsakuten\.se\/(sitemap-\d+\.xml)/g,
'https://sitemaps.korsordsakuten.se/$1'
);
writeFileSync(join(WORK_DIR, f), rewritten);
} else {
copyFileSync(join(SRC_DIR, f), join(WORK_DIR, f));
}
}
A custom subdomain (sitemaps.korsordsakuten.se → GitHub Pages via CNAME) makes the URLs clean. Now Google can always fetch sitemaps even if the main site is sleeping.
Structured Data That Actually Helps
For clue pages (/korsord/[word]), I added QAPage schema instead of just FAQPage:
{
"@type": "QAPage",
"mainEntity": {
"@type": "Question",
"name": "avslutningsvis — korsordssvar",
"answerCount": 8,
"acceptedAnswer": {
"@type": "Answer",
"text": "SLUTLIGEN (9 bokstäver)",
"upvoteCount": 8
},
"suggestedAnswer": []
}
}
This signals to Google that each clue page is structured Q&A content — similar to how Q&A forum sites are interpreted — rather than auto-generated thin content.
For word pages, DefinedTerm with synonyms as alternateName marks the page as a lexical resource:
{
"@type": "DefinedTerm",
"name": "PLATS",
"alternateName": ["STÄLLE", "POSITION", "LÄGE"],
"inDefinedTermSet": {
"@type": "DefinedTermSet",
"name": "Korsordsakuten ordlista"
}
}
Daily Fresh Content
One thing competitor sites have that purely static sites lack: freshness signals. Google re-crawls active sites more frequently.
Solution: /dagens-ledtradar — a page showing 24 curated crossword clues that rotates every day using a deterministic seed:
const dayNum = Math.floor((Date.now() - epoch) / 86400000);
const rng = mulberry32(dayNum * 7919 + 13);
// Pick 24 unique entries from top-2000 clues
Same date = same picks (the page is cacheable), but crawling the URL the next day returns different content. Triggers Google's freshness heuristic without any database or cron job.
What I'd Do Differently
- Start with a paid host. Render free tier sleeping + OOM issues cost weeks of SEO recovery time.
- Plan for JSON size early. Lazy loading was the fix but the problem was predictable.
- Submit 10 hand-picked URLs to GSC from day one. Don't wait for the sitemap crawler to discover everything.
Links
- Site: korsordsakuten.se
- Daily clues: /dagens-ledtradar
Happy to answer questions about any part of the stack!
Top comments (0)