Abhraneel Dhar

Posted on Apr 4 • Originally published at whitepapper.antk.in

Whitepapper SEO + GEO Implementation Plan

#seo #geo #metadata #canonical

1) Executive Summary

Current state has critical crawl/index/content gaps:

No SEO metadata system (titles, descriptions, canonicals, OG/Twitter tags) in shared layout.
No sitemap, no robots strategy, no structured data.
Important nav links point to missing routes (/blog, /pricing, /int), causing internal crawl waste.
Multiple thin pages with placeholder text only (docs, blogs, integrations, components, privacy-policy, terms-of-service).
Public content discoverability is weak: collection papers are mostly hidden behind client interactions and no sitemap.
Public pages have duplicate URL risk due handle/slug normalization without canonical redirects.
Public API payload/query patterns can be faster (redundant calls, no pagination, no response caching headers at API layer).

2) High-Priority Findings (Code Evidence)

Critical

No global metadata framework

File: astro/src/layouts/Layout.astro
Only <title> exists; no meta description, canonical, robots, OG, Twitter, JSON-LD support.

Broken internal links in home nav

File: astro/src/pages/index.astro
Links currently include /int, /blog, /pricing but these pages do not exist.

Thin/placeholder pages are indexable quality risk

Files:
- astro/src/pages/blogs.astro
- astro/src/pages/docs.astro
- astro/src/pages/integrations.astro
- astro/src/pages/components.astro
- astro/src/pages/privacy-policy.astro
- astro/src/pages/terms-of-service.astro
Each file currently contains only 1 line of plain text.

Soft-404 behavior risk from redirects to /404

Files:
- astro/src/pages/[handle]/index.astro
- astro/src/pages/[handle]/p/[projectSlug]/index.astro
Not-found cases use redirects instead of direct 404 response rendering.

No sitemap/robots endpoints or config

File: astro/astro.config.mjs
No site config and no sitemap integration.
No robots page/file in astro/src/pages.

High

Public project page performs redundant data retrieval

File: astro/src/pages/[handle]/p/[projectSlug]/index.astro
Fetches project data, then separately fetches owner profile (getPublicProfile) even though both are public payload concerns.

URL duplication risk from normalization without redirect

Files:
- astro/src/pages/[handle]/index.astro
- astro/src/pages/[handle]/[slug].astro
- astro/src/pages/[handle]/p/[projectSlug]/index.astro
@handle, uppercase handles, and slug variants resolve to same content but are not canonically redirected.

Collection paper discovery is weak for bots

File: astro/src/components/project/ProjectCollectionsViewer.tsx
Collection papers are fetched only on accordion interaction, limiting crawl discovery without strong sitemap support.

Public pages hydrate large React islands

Files:
- astro/src/pages/[handle]/index.astro (ProfilePage client:load)
- astro/src/pages/[handle]/p/[projectSlug]/index.astro (PublicProjectPage client:load)
Moves more JS to clients than needed for mostly content pages.

No image SEO baseline

Files:
- astro/src/components/paperCardComponent.tsx
- astro/src/components/project/PublicProjectPage.tsx
Several images miss alt, explicit dimensions, and optimized delivery path.

Medium

Public API has no pagination for potentially large lists

File: fastapi/app/api/v1/endpoints/public.py
Endpoints return full arrays for projects/papers/collections.

Public API list methods rely on unbounded field scans

Files:
- fastapi/app/services/papers_service.py
- fastapi/app/services/projects_service.py
- fastapi/app/core/firestore_store.py
find_by_fields(...).stream() without cursor pagination or ordering for public feed endpoints.

Missing API-level response caching/compression headers

Files:
- fastapi/app/main.py
- fastapi/app/api/v1/endpoints/public.py
No gzip middleware and no public endpoint cache-control/etag policy.

3) Target SEO + GEO Architecture

3.1 Metadata + Canonical System (Global)

Implement a shared SEO props model in layout:

title
description
canonical
robots
ogType, ogImage, ogSiteName
twitterCard, twitterSite
jsonLd (array support)

Files to change:

astro/src/layouts/Layout.astro
New helper: astro/src/lib/seo.ts

3.2 Robots + Sitemap + Feeds

Add:

astro/src/pages/robots.txt.ts
astro/src/pages/sitemap-index.xml.ts
astro/src/pages/sitemaps/public-pages.xml.ts
astro/src/pages/sitemaps/public-papers.xml.ts
astro/src/pages/sitemaps/public-projects.xml.ts
astro/src/pages/rss.xml.ts (marketing/blog feed)

Use segmented sitemaps for scaling and easier monitoring.

3.3 Structured Data (JSON-LD)

Add JSON-LD by page type:

Homepage: Organization, WebSite
User page: Person, ProfilePage
Project page: CollectionPage or CreativeWorkSeries
Paper page: Article + BreadcrumbList
Blog post pages: BlogPosting

3.4 GEO (AI Search) Layer

Add:

/llms.txt (concise map of high-value URLs + product definition)
/llms-full.txt (expanded, machine-friendly knowledge document)
Q&A blocks on key pages (problem -> approach -> examples -> constraints)
Strong author/entity signals (real author cards, updated dates, source citations)
Comparison pages and use-case pages with structured, factual answers

4) Public User/Project/Paper Page Upgrade Plan

4.1 `/[handle]` user page

Current issues:

Minimal metadata, no Person schema, potential duplicate URLs.

Changes:

Add unique title/description from user profile.
Add canonical URL and normalized redirect (/@name -> /name, uppercase -> lowercase).
Add Person + ProfilePage JSON-LD.
Add server-rendered links to all public content (standalone + collection papers via dedicated pages or sitemap guarantee).
Keep only small interactive island for tab switching if needed.

4.2 `/[handle]/p/[projectSlug]` project page

Current issues:

No metadata/schema.
Redundant owner fetch.
Collection paper links are loaded lazily.

Changes:

Include owner in project API payload; remove extra profile request.
Add CollectionPage schema and rich metadata.
Pre-render top papers and collection links server-side.
Add paginated collection pages if project is large.

4.3 `/[handle]/[slug]` paper page

Current issues:

No article metadata/schema.
Duplicate URL normalization risk.

Changes:

Add Article JSON-LD and BreadcrumbList.
Add reading-time, updated-at, author link, related papers internal links.
Add canonical redirect rules for slug normalization.
Add server-side excerpt generation for description when missing.

5) Pages To Add and Modify

5.1 Must Add (Revenue + Authority + GEO)

/pricing
/blog (index)
/blog/[slug] (marketing posts)
/features
/use-cases/[segment] (at least 4 initial segments)
/compare/[alternative] (at least 3 initial alternatives)
/changelog
/about
/contact
1. /llms.txt
2. /llms-full.txt
3. /robots.txt
4. Sitemap endpoints (index + segmented maps)

5.2 Must Fix Existing Routes

index.astro nav links (/int, /blog, /pricing) -> valid URLs.
Expand all one-line thin pages or set temporary noindex until complete.
Footer must expose crawlable legal/support links.
Reserve new root paths in:

astro/src/lib/reservedPaths.ts
fastapi/app/core/reserved_paths.py

6) Blog Strategy (Topics + Information Architecture)

6.1 Recommended blog clusters

Cluster A: Programmatic SEO and content operations

Programmatic SEO fundamentals for API-first CMS
Building content hubs that avoid cannibalization
Scaling internal linking with structured content

Cluster B: Developer publishing workflows

Markdown-first publishing architecture
Multi-channel distribution automation
CMS API design patterns for teams

Cluster C: AI search readiness (GEO)

How LLMs retrieve and cite web content
Designing pages for AI overview inclusion
Entity SEO and structured data for developer products

Cluster D: Technical SEO for content-heavy products

Core Web Vitals for content platforms
Crawl budget and pagination in dynamic sites
Canonicalization patterns for user-generated content

6.2 Suggested first 20 posts

Create 5 posts per cluster above, with one pillar page per cluster and 4 supporting posts each. Interlink pillar <-> supporting posts bi-directionally.

7) Where To Store Blog Data (Yes, Database Is Fine)

Yes, you can store blogs in a database. Recommended approach:

Option A (Recommended for your stack): Firestore-backed blog content

New collections:

marketingPosts
marketingAuthors
marketingCategories
marketingTags

marketingPosts fields:

postId, slug, title, excerpt, bodyMarkdown
authorId, categoryId, tagIds[]
status (draft|published)
publishedAt, updatedAt
canonicalUrl
coverImageUrl, coverImageAlt
metaTitle, metaDescription
ogImageUrl

Rules:

Precompute and store readingTime, toc, wordCount.
Cache list/detail responses in Redis.
Serve paginated APIs (cursor, limit).

Option B: MDX files in repo for marketing pages

Best for editorial versioning and static pre-render speed.

Hybrid recommendation

Marketing blog/docs pages in MDX (high control, fast builds).
User-generated papers/projects remain in Firestore.

8) Data Fetching and Speed Improvement Plan

Frontend (Astro)

Remove redundant API calls on project page by extending one backend payload.
Convert large public pages to mostly server-rendered HTML with small client islands.
Avoid loading collection papers only after click if discoverability matters; render crawlable links.
Add image optimization strategy (dimensions, modern format, priority only for LCP image).
Add explicit cache policy per route and avoid inconsistent headers.

Backend (FastAPI + Firestore)

Add paginated public list endpoints:

/public/{handle}?paper_limit=...&paper_cursor=...
/public/{handle}/projects/{project_slug}?...

Add pre-sorted query support in store layer (order_by, limit, start_after).
Add aggregate cache keys for public profile/project payloads.
Add response compression middleware.
Add Cache-Control and optional ETag on public responses.
Add lightweight list DTOs for cards (avoid large body fields unless needed).

9) Phase-by-Phase Implementation

Phase 0 (Day 1-2): Critical crawl/index foundation

Build global SEO metadata system in layout.
Add robots + sitemap endpoints.
Fix nav links and route mismatches.
Decide canonical host and enforce HTTPS/non-www policy.
Replace 302-to-404 pattern with proper 404 responses.

Success criteria:

Every indexable URL has unique title + description + canonical.
Sitemaps live and robots references them.

Phase 1 (Day 3-5): Public page SEO + schema

Implement metadata + JSON-LD for user/project/paper pages.
Normalize URLs with redirect rules.
Improve heading structure and on-page content snippets.
Add related-content internal linking on paper pages.

Success criteria:

Rich Results validation passes for article/profile pages.
No duplicate URL variants in crawl exports.

Phase 2 (Week 2): Content and GEO expansion

Launch /blog, /pricing, /features, /use-cases, /compare.
Publish first 20 posts across 4 clusters.
Add /llms.txt + /llms-full.txt.
Add author pages and E-E-A-T elements.

Success criteria:

Search Console indexed pages grows steadily.
AI assistants can retrieve clean product definitions and citations.

Phase 3 (Week 3): Performance and scale

Add pagination and caching for heavy public endpoints.
Reduce hydration JS on public pages.
Introduce query-level optimization in Firestore access layer.
Add monitoring dashboards and SLOs.

Success criteria:

Lower TTFB and faster LCP on public pages.
Stable response times under larger datasets.

10) Manual Tasks Outside This Project

Google Search Console

Verify domain property.
Submit sitemap index.
Inspect and request indexing for key new pages.
Monitor coverage, CWV, and enhancement reports weekly.

Bing Webmaster Tools

Verify site and submit sitemap.

Analytics and monitoring

GA4 + conversion events for signups and content-to-signup paths.
Track organic landing pages, CTR, and assisted conversions.

CDN and hosting

Ensure Brotli/gzip enabled at edge.
Confirm caching behavior for HTML vs static assets.

Editorial process

Assign author owners per cluster.
Publish cadence: minimum 2 posts/week for first 10 weeks.
Quarterly content refresh for top pages.

Authority building

Acquire links from developer communities and partner integrations.
Publish benchmark/case-study posts with original data.

Brand/entity consistency

Keep organization name, social profiles, and product description consistent across site and external profiles.

11) KPI Dashboard (Track Weekly)

Primary:

Indexed pages
Non-brand impressions and clicks
Avg position for target clusters
Organic signup conversions

Technical:

LCP, INP, CLS for top templates
Crawl errors and duplicate/canonical issues
Sitemap indexed-to-submitted ratio

GEO:

Brand/entity mentions in AI answers
Citation frequency of your domain in AI outputs
Referral traffic from AI assistants (when detectable)

12) Immediate Next 10 Engineering Tasks

Implement SEO prop contract in Layout.astro.
Add seo.ts helper to generate canonical/meta defaults.
Create robots.txt.ts and sitemap routes.
Fix broken nav URLs in index.astro.
Replace one-line thin pages with real content or temporary noindex.
Add page metadata + JSON-LD to:

[handle]/index.astro
[handle]/p/[projectSlug]/index.astro
[handle]/[slug].astro

Add normalized redirect logic for handle/slug variants.
Extend public project API to include owner summary in one response.
Add pagination params to public profile/project APIs.
1. Add FastAPI compression and response cache headers for public endpoints.

If you execute Phases 0 and 1 completely, you should see meaningful crawl/index quality improvement quickly. Phases 2 and 3 are where long-term SEO + GEO compounding happens.

Top comments (1)

Peter Hallander • Apr 4

Thorough breakdown. A few specific points worth flagging from the implementation plan:

On the blog cluster strategy (section 6): the "pillar <-> supporting posts bi-directional linking" you describe is exactly right, but it's worth thinking about how you surface those links programmatically as the blog grows beyond 20 posts. Manual curation works at 20 posts; at 100+ it becomes stale fast. The pattern that scales well is scoring-based: compute a relevance score between any two posts by weighting tag overlap at ~70% and title keyword similarity at ~30%, then surface the top N results as related posts. That way new posts automatically get linked to existing ones without anyone having to remember to update old articles. (This is something we built into our Shopify app Better Related Blog Posts — apps.shopify.com/better-related-blog-posts — for the same reason: manual related post lists rot.)

On the GEO layer (section 3.4): the /llms.txt approach is good, but the Q&A blocks on key pages are actually more impactful for AI citation than llms.txt in current practice. AI assistants are pattern-matching on structured question+answer text directly from the page, not from llms.txt declarations. The speakable schema markup (which you don't mention but is adjacent to what you're doing) is worth adding alongside the Q&A blocks — it explicitly signals to AI engines which sections of a page are meant to be read aloud or cited.

On Phase 0 priority: I'd sequence the robots.txt + sitemap before the metadata implementation, not after. Without a sitemap, Googlebot can't efficiently discover the normalized URLs you're building, which means the canonical work you do in Phase 1 is invisible until the next crawl cycle anyway.

1) Executive Summary

2) High-Priority Findings (Code Evidence)

Critical

High

Medium

3) Target SEO + GEO Architecture

3.1 Metadata + Canonical System (Global)

3.2 Robots + Sitemap + Feeds

3.3 Structured Data (JSON-LD)

3.4 GEO (AI Search) Layer

4) Public User/Project/Paper Page Upgrade Plan

4.1 /[handle] user page

4.2 /[handle]/p/[projectSlug] project page

4.3 /[handle]/[slug] paper page

5) Pages To Add and Modify

5.1 Must Add (Revenue + Authority + GEO)

5.2 Must Fix Existing Routes

6) Blog Strategy (Topics + Information Architecture)

6.1 Recommended blog clusters

6.2 Suggested first 20 posts

7) Where To Store Blog Data (Yes, Database Is Fine)

Option A (Recommended for your stack): Firestore-backed blog content

Option B: MDX files in repo for marketing pages

Hybrid recommendation

8) Data Fetching and Speed Improvement Plan

Frontend (Astro)

Backend (FastAPI + Firestore)

9) Phase-by-Phase Implementation

Phase 0 (Day 1-2): Critical crawl/index foundation

Phase 1 (Day 3-5): Public page SEO + schema

Phase 2 (Week 2): Content and GEO expansion

Phase 3 (Week 3): Performance and scale

10) Manual Tasks Outside This Project

11) KPI Dashboard (Track Weekly)

12) Immediate Next 10 Engineering Tasks

4.1 `/[handle]` user page

4.2 `/[handle]/p/[projectSlug]` project page

4.3 `/[handle]/[slug]` paper page