How This Site Is Built

#infrastructure #buildlog

TL;DR

Astro on S3 + CloudFront with OAC, DNS on Cloudflare (saves $49/year vs Route 53 on a .ai domain), email via SES + Lambda forward — total cost under $5/month with no managed platform dependency.
Every post URL returns a 404 out of the box: CloudFront's DefaultRootObject only applies to /, not subdirectories — fix is a 15-line CloudFront Function that appends index.html before S3 sees the request.
llms-full.txt auto-generated at build time concatenates every post into one Markdown file — any agent gets the complete site corpus in a single HTTP GET, zero crawling required.
GitHub Actions deploys run two-pass S3 syncs: static assets get max-age=31536000,immutable; HTML and llms*.txt get no-cache — because Astro hashes asset filenames but not HTML.
AWS credentials never touch the repo — load-secrets-action resolves op:// references at CI runtime; raw keys stay out of git entirely.

I spent a day building this site from scratch. Not because there aren't easier options — there are plenty — but because I wanted to own the infrastructure and understand every layer. Here's what I built and why.

The constraint that drove every decision

I didn't want a managed platform. Ghost, Squarespace, Substack — they're all fine until they're not. Pricing changes. Features get enshittified. Export formats break. The moment you depend on someone else's persistence layer for your writing, you're renting, not owning.

The goal: my content in plain markdown files, my infrastructure, my control. Total cost under $5/month.

The stack

Static site generator: Astro

Astro compiles everything to static HTML at build time. No server, no runtime, no database. The site is just files. I used the Astro Paper theme as a base — dark mode default, clean typography, built-in search.

The build script does three things: generates llms-full.txt (more on that below), runs the Astro build, then generates the search index with Pagefind. One command.

Hosting: S3 + CloudFront

The built files go into an S3 bucket with public access completely disabled. CloudFront sits in front of it using Origin Access Control — only CloudFront can read from S3, nothing else. ACM handles the SSL cert.

DNS: Cloudflare

DNS is on Cloudflare, not Route 53. Route 53 charges $129/year for a .ai domain; Cloudflare charges $80. That's the entire reason for the split. The apex domain and www point at the CloudFront distribution via Cloudflare's DNS — the hosting layer doesn't change. Cloudflare's token model also makes it easy to give external tools (MCP servers, automation) scoped API access without handing over full account credentials.

Result: HTTPS enforced, HTTP redirects automatically, global CDN edge caching, and the bucket itself is locked down.

One gotcha I hit immediately: every post URL returned a 404. The files were in S3, the deploy worked fine — but /posts/my-post/ returned nothing.

The problem is subtle. Astro (like most static site generators) builds each post as /posts/my-post/index.html — an index.html file inside a subdirectory. S3 doesn't resolve directories to index files. CloudFront's DefaultRootObject setting only applies to the apex / — it doesn't cascade to subdirectories. So when CloudFront asked S3 for /posts/my-post/, S3 found no object at that exact key, returned a 403, and CloudFront served the 404 page.

The fix is a CloudFront Function — a small JavaScript function that runs at the CDN edge on every incoming request before it reaches S3. It checks the URI: if it ends with /, append index.html. If it has no file extension, append /index.html. That's it — about 15 lines of code.

function handler(event) {
  var request = event.request;
  var uri = request.uri;

  if (uri.endsWith('/')) {
    request.uri += 'index.html';
  } else if (!uri.includes('.')) {
    request.uri += '/index.html';
  }

  return request;
}

Attach it to the distribution as a viewer-request handler and every URL on the site resolves correctly. This is a one-time infrastructure fix — not something you repeat per deploy.

If you're building a static site on S3 + CloudFront with OAC, add this function before you go live. It's the kind of thing that works fine locally (dev servers handle it automatically) and breaks silently in production.

Email: SES + Lambda

(Note: email forwarding infra was set up for the domain, but the address is not currently listed as a public contact method — use X or LinkedIn instead. The setup details below are kept for the build history.)

I wanted info@artificialcuriositylabs.ai to work as a real email address without running a mail server. The setup:

SES receives inbound email for the domain
A receipt rule stores incoming messages in S3
A Lambda function rewrites the headers and forwards to Gmail, preserving the Reply-To so replies look native

It took about an hour to wire up. DKIM CNAME records and MX record in Cloudflare DNS, pointing at SES inbound. The Lambda function is about 80 lines of Node.js. Works exactly like having an inbox without actually having one.

The decision I'm most glad I made: llms.txt

There's an emerging standard for AI-readable content — llms.txt as an index file, similar to robots.txt, that tells AI crawlers what's on the site and where. I added two files:

/llms.txt — a curated index of pages, topics, and permissions
/llms-full.txt — every blog post concatenated into a single file, auto-generated at build time

The second one is the interesting one. Any AI system that fetches llms-full.txt gets the complete text of everything I've published, in one request, structured and clean. It's a better interface for AI consumption than crawling individual HTML pages.

The generator script reads from the blog content directory, strips frontmatter, and concatenates everything with headers separating posts. Runs in under a second as part of the normal build.

I don't know exactly how this will get used — but making the content machine-readable is a zero-cost decision with asymmetric upside.

The deploy workflow

Production deploy is GitHub Actions. Push to main triggers .github/workflows/deploy.yml when content or build inputs change (src/**, public/**, scripts/**, astro.config.ts, package.json). The workflow:

Validates configuration — fails fast if required repository secrets or variables are missing
Loads AWS credentials from 1Password — load-secrets-action resolves op:// references stored in GitHub secrets (not raw keys in the repo)
npm run build — generates llms-full.txt, runs the Astro build, generates the Pagefind search index
Two S3 syncs — static assets (JS, CSS, images) get long-lived cache headers (max-age=31536000,immutable); HTML, XML, and llms*.txt get no-cache headers so browsers always fetch the latest
CloudFront invalidation — /* to flush the CDN edge cache
Blog markdown sync — src/data/blog/*.md copied to an S3 blog/ prefix (source for a Bedrock knowledge base)
KB ingestion trigger — starts a Bedrock re-sync; marked non-blocking so a job already in progress doesn't fail the deploy

The two-pass sync matters because the cache strategy is different per file type. Static assets are content-addressed (Astro hashes filenames), so they can be cached indefinitely. HTML is not — you want readers to see the new post immediately.

Local deploy is the same pipeline, different credential path. ./deploy.sh runs the build and S3/CloudFront steps on your machine. It requires three environment variables — AWS_PROFILE, S3_BUCKET, CLOUDFRONT_DISTRIBUTION_ID — and uses your local AWS CLI profile instead of 1Password-in-CI. Use it when you want to deploy without pushing to main, or when debugging a failed Actions run.

The split between Cloudflare DNS and CloudFront hosting means Cloudflare never touches the content. DNS resolves to the CloudFront distribution, CloudFront pulls from S3, and deploy invalidates CloudFront directly. Cloudflare sees none of this — it just points the domain at the right IP.

What is not in the repo: bucket name, distribution ID, and 1Password item references live in GitHub repository variables and secrets. The workflow file names what must exist; the values stay out of git.

The question this raises

Static sites feel like going backward until you realize what you're trading away: runtime complexity, database dependencies, server costs, someone else's uptime SLA. The question isn't "why would you use a static site in 2026" — it's "why would you add a server if you don't need one?"

Timeline (initial setup)

2026-05-05 — Day one

Domain registered: artificialcuriositylabs.dev via Route 53 ($17/year).
AWS personal account setup: root MFA enabled, root keys deleted, IAM admin user, CloudTrail (multi-region), IAM Access Analyzer, monthly budget alerts.
Static site stack: Astro Paper theme (dark default + toggle), S3 bucket (private), CloudFront with OAC, ACM SSL, initial Route 53 records, llms.txt + llms-full.txt (AI-readable layer, build-time generated).
Email forwarding: SES inbound + Lambda to forward for info@artificialcuriositylabs.ai (infra set up; not currently listed for public contact — use socials), DKIM/MX in DNS.
Homepage: custom hero + "Now" section (3 cards for writing/building/experimenting).

Subsequent updates (captured in posts and deploys)

DNS switched to Cloudflare (cost + token model for agent/MCP access).
GitHub Actions deploy workflow (validates vars, loads secrets from 1Password at runtime, two-pass S3 sync for cache headers, CloudFront invalidation, blog markdown sync to S3 for Bedrock KB, non-blocking KB ingestion trigger).
Local ./deploy.sh parity with CI.
Ongoing: component refreshes (e.g. Surface for Now cards), messaging alignment (Now + About), syndication pipelines added for distribution, site structure simplification (retired redundant Build Log and Archives pages to reduce duplication).

The full change history lives in the repo. The dedicated Build Log page has been retired — its value is now distributed across this post, the blog posts themselves (many tagged build-log), the homepage "Now" cards, and the agent-infra work documented elsewhere.