1) Executive Summary
Current state has critical crawl/index/content gaps:
- No SEO metadata system (titles, descriptions, canonicals, OG/Twitter tags) in shared layout.
- No sitemap, no robots strategy, no structured data.
- Important nav links point to missing routes (
/blog,/pricing,/int), causing internal crawl waste. - Multiple thin pages with placeholder text only (
docs,blogs,integrations,components,privacy-policy,terms-of-service). - Public content discoverability is weak: collection papers are mostly hidden behind client interactions and no sitemap.
- Public pages have duplicate URL risk due handle/slug normalization without canonical redirects.
- Public API payload/query patterns can be faster (redundant calls, no pagination, no response caching headers at API layer).
2) High-Priority Findings (Code Evidence)
Critical
- No global metadata framework
- File:
astro/src/layouts/Layout.astro - Only
<title>exists; no meta description, canonical, robots, OG, Twitter, JSON-LD support.
- Broken internal links in home nav
- File:
astro/src/pages/index.astro - Links currently include
/int,/blog,/pricingbut these pages do not exist.
- Thin/placeholder pages are indexable quality risk
- Files:
astro/src/pages/blogs.astroastro/src/pages/docs.astroastro/src/pages/integrations.astroastro/src/pages/components.astroastro/src/pages/privacy-policy.astroastro/src/pages/terms-of-service.astro
- Each file currently contains only 1 line of plain text.
- Soft-404 behavior risk from redirects to
/404
- Files:
astro/src/pages/[handle]/index.astroastro/src/pages/[handle]/p/[projectSlug]/index.astro
- Not-found cases use redirects instead of direct 404 response rendering.
- No sitemap/robots endpoints or config
- File:
astro/astro.config.mjs - No
siteconfig and no sitemap integration. - No robots page/file in
astro/src/pages.
High
- Public project page performs redundant data retrieval
- File:
astro/src/pages/[handle]/p/[projectSlug]/index.astro - Fetches project data, then separately fetches owner profile (
getPublicProfile) even though both are public payload concerns.
- URL duplication risk from normalization without redirect
- Files:
astro/src/pages/[handle]/index.astroastro/src/pages/[handle]/[slug].astroastro/src/pages/[handle]/p/[projectSlug]/index.astro
-
@handle, uppercase handles, and slug variants resolve to same content but are not canonically redirected.
- Collection paper discovery is weak for bots
- File:
astro/src/components/project/ProjectCollectionsViewer.tsx - Collection papers are fetched only on accordion interaction, limiting crawl discovery without strong sitemap support.
- Public pages hydrate large React islands
- Files:
-
astro/src/pages/[handle]/index.astro(ProfilePage client:load) -
astro/src/pages/[handle]/p/[projectSlug]/index.astro(PublicProjectPage client:load)
-
- Moves more JS to clients than needed for mostly content pages.
- No image SEO baseline
- Files:
astro/src/components/paperCardComponent.tsxastro/src/components/project/PublicProjectPage.tsx
- Several images miss
alt, explicit dimensions, and optimized delivery path.
Medium
- Public API has no pagination for potentially large lists
- File:
fastapi/app/api/v1/endpoints/public.py - Endpoints return full arrays for projects/papers/collections.
- Public API list methods rely on unbounded field scans
- Files:
fastapi/app/services/papers_service.pyfastapi/app/services/projects_service.pyfastapi/app/core/firestore_store.py
-
find_by_fields(...).stream()without cursor pagination or ordering for public feed endpoints.
- Missing API-level response caching/compression headers
- Files:
fastapi/app/main.pyfastapi/app/api/v1/endpoints/public.py
- No gzip middleware and no public endpoint cache-control/etag policy.
3) Target SEO + GEO Architecture
3.1 Metadata + Canonical System (Global)
Implement a shared SEO props model in layout:
titledescriptioncanonicalrobots-
ogType,ogImage,ogSiteName -
twitterCard,twitterSite -
jsonLd(array support)
Files to change:
astro/src/layouts/Layout.astro- New helper:
astro/src/lib/seo.ts
3.2 Robots + Sitemap + Feeds
Add:
astro/src/pages/robots.txt.tsastro/src/pages/sitemap-index.xml.tsastro/src/pages/sitemaps/public-pages.xml.tsastro/src/pages/sitemaps/public-papers.xml.tsastro/src/pages/sitemaps/public-projects.xml.ts-
astro/src/pages/rss.xml.ts(marketing/blog feed)
Use segmented sitemaps for scaling and easier monitoring.
3.3 Structured Data (JSON-LD)
Add JSON-LD by page type:
- Homepage:
Organization,WebSite - User page:
Person,ProfilePage - Project page:
CollectionPageorCreativeWorkSeries - Paper page:
Article+BreadcrumbList - Blog post pages:
BlogPosting
3.4 GEO (AI Search) Layer
Add:
-
/llms.txt(concise map of high-value URLs + product definition) -
/llms-full.txt(expanded, machine-friendly knowledge document) - Q&A blocks on key pages (problem -> approach -> examples -> constraints)
- Strong author/entity signals (real author cards, updated dates, source citations)
- Comparison pages and use-case pages with structured, factual answers
4) Public User/Project/Paper Page Upgrade Plan
4.1 /[handle] user page
Current issues:
- Minimal metadata, no Person schema, potential duplicate URLs.
Changes:
- Add unique title/description from user profile.
- Add canonical URL and normalized redirect (
/@name->/name, uppercase -> lowercase). - Add
Person+ProfilePageJSON-LD. - Add server-rendered links to all public content (standalone + collection papers via dedicated pages or sitemap guarantee).
- Keep only small interactive island for tab switching if needed.
4.2 /[handle]/p/[projectSlug] project page
Current issues:
- No metadata/schema.
- Redundant owner fetch.
- Collection paper links are loaded lazily.
Changes:
- Include owner in project API payload; remove extra profile request.
- Add
CollectionPageschema and rich metadata. - Pre-render top papers and collection links server-side.
- Add paginated collection pages if project is large.
4.3 /[handle]/[slug] paper page
Current issues:
- No article metadata/schema.
- Duplicate URL normalization risk.
Changes:
- Add
ArticleJSON-LD andBreadcrumbList. - Add reading-time, updated-at, author link, related papers internal links.
- Add canonical redirect rules for slug normalization.
- Add server-side excerpt generation for description when missing.
5) Pages To Add and Modify
5.1 Must Add (Revenue + Authority + GEO)
/pricing-
/blog(index) -
/blog/[slug](marketing posts) /features-
/use-cases/[segment](at least 4 initial segments) -
/compare/[alternative](at least 3 initial alternatives) /changelog/about-
/contact/llms.txt/llms-full.txt/robots.txt- Sitemap endpoints (index + segmented maps)
5.2 Must Fix Existing Routes
-
index.astronav links (/int,/blog,/pricing) -> valid URLs. - Expand all one-line thin pages or set temporary
noindexuntil complete. - Footer must expose crawlable legal/support links.
- Reserve new root paths in:
astro/src/lib/reservedPaths.tsfastapi/app/core/reserved_paths.py
6) Blog Strategy (Topics + Information Architecture)
6.1 Recommended blog clusters
Cluster A: Programmatic SEO and content operations
- Programmatic SEO fundamentals for API-first CMS
- Building content hubs that avoid cannibalization
- Scaling internal linking with structured content
Cluster B: Developer publishing workflows
- Markdown-first publishing architecture
- Multi-channel distribution automation
- CMS API design patterns for teams
Cluster C: AI search readiness (GEO)
- How LLMs retrieve and cite web content
- Designing pages for AI overview inclusion
- Entity SEO and structured data for developer products
Cluster D: Technical SEO for content-heavy products
- Core Web Vitals for content platforms
- Crawl budget and pagination in dynamic sites
- Canonicalization patterns for user-generated content
6.2 Suggested first 20 posts
Create 5 posts per cluster above, with one pillar page per cluster and 4 supporting posts each. Interlink pillar <-> supporting posts bi-directionally.
7) Where To Store Blog Data (Yes, Database Is Fine)
Yes, you can store blogs in a database. Recommended approach:
Option A (Recommended for your stack): Firestore-backed blog content
New collections:
marketingPostsmarketingAuthorsmarketingCategoriesmarketingTags
marketingPosts fields:
-
postId,slug,title,excerpt,bodyMarkdown -
authorId,categoryId,tagIds[] -
status(draft|published) -
publishedAt,updatedAt canonicalUrl-
coverImageUrl,coverImageAlt -
metaTitle,metaDescription ogImageUrl
Rules:
- Precompute and store
readingTime,toc,wordCount. - Cache list/detail responses in Redis.
- Serve paginated APIs (
cursor,limit).
Option B: MDX files in repo for marketing pages
Best for editorial versioning and static pre-render speed.
Hybrid recommendation
- Marketing blog/docs pages in MDX (high control, fast builds).
- User-generated papers/projects remain in Firestore.
8) Data Fetching and Speed Improvement Plan
Frontend (Astro)
- Remove redundant API calls on project page by extending one backend payload.
- Convert large public pages to mostly server-rendered HTML with small client islands.
- Avoid loading collection papers only after click if discoverability matters; render crawlable links.
- Add image optimization strategy (dimensions, modern format, priority only for LCP image).
- Add explicit cache policy per route and avoid inconsistent headers.
Backend (FastAPI + Firestore)
- Add paginated public list endpoints:
/public/{handle}?paper_limit=...&paper_cursor=.../public/{handle}/projects/{project_slug}?...
- Add pre-sorted query support in store layer (
order_by,limit,start_after). - Add aggregate cache keys for public profile/project payloads.
- Add response compression middleware.
- Add
Cache-Controland optionalETagon public responses. - Add lightweight list DTOs for cards (avoid large body fields unless needed).
9) Phase-by-Phase Implementation
Phase 0 (Day 1-2): Critical crawl/index foundation
- Build global SEO metadata system in layout.
- Add robots + sitemap endpoints.
- Fix nav links and route mismatches.
- Decide canonical host and enforce HTTPS/non-www policy.
- Replace 302-to-404 pattern with proper 404 responses.
Success criteria:
- Every indexable URL has unique title + description + canonical.
- Sitemaps live and robots references them.
Phase 1 (Day 3-5): Public page SEO + schema
- Implement metadata + JSON-LD for user/project/paper pages.
- Normalize URLs with redirect rules.
- Improve heading structure and on-page content snippets.
- Add related-content internal linking on paper pages.
Success criteria:
- Rich Results validation passes for article/profile pages.
- No duplicate URL variants in crawl exports.
Phase 2 (Week 2): Content and GEO expansion
- Launch
/blog,/pricing,/features,/use-cases,/compare. - Publish first 20 posts across 4 clusters.
- Add
/llms.txt+/llms-full.txt. - Add author pages and E-E-A-T elements.
Success criteria:
- Search Console indexed pages grows steadily.
- AI assistants can retrieve clean product definitions and citations.
Phase 3 (Week 3): Performance and scale
- Add pagination and caching for heavy public endpoints.
- Reduce hydration JS on public pages.
- Introduce query-level optimization in Firestore access layer.
- Add monitoring dashboards and SLOs.
Success criteria:
- Lower TTFB and faster LCP on public pages.
- Stable response times under larger datasets.
10) Manual Tasks Outside This Project
- Google Search Console
- Verify domain property.
- Submit sitemap index.
- Inspect and request indexing for key new pages.
- Monitor coverage, CWV, and enhancement reports weekly.
- Bing Webmaster Tools
- Verify site and submit sitemap.
- Analytics and monitoring
- GA4 + conversion events for signups and content-to-signup paths.
- Track organic landing pages, CTR, and assisted conversions.
- CDN and hosting
- Ensure Brotli/gzip enabled at edge.
- Confirm caching behavior for HTML vs static assets.
- Editorial process
- Assign author owners per cluster.
- Publish cadence: minimum 2 posts/week for first 10 weeks.
- Quarterly content refresh for top pages.
- Authority building
- Acquire links from developer communities and partner integrations.
- Publish benchmark/case-study posts with original data.
- Brand/entity consistency
- Keep organization name, social profiles, and product description consistent across site and external profiles.
11) KPI Dashboard (Track Weekly)
Primary:
- Indexed pages
- Non-brand impressions and clicks
- Avg position for target clusters
- Organic signup conversions
Technical:
- LCP, INP, CLS for top templates
- Crawl errors and duplicate/canonical issues
- Sitemap indexed-to-submitted ratio
GEO:
- Brand/entity mentions in AI answers
- Citation frequency of your domain in AI outputs
- Referral traffic from AI assistants (when detectable)
12) Immediate Next 10 Engineering Tasks
- Implement SEO prop contract in
Layout.astro. - Add
seo.tshelper to generate canonical/meta defaults. - Create
robots.txt.tsand sitemap routes. - Fix broken nav URLs in
index.astro. - Replace one-line thin pages with real content or temporary
noindex. - Add page metadata + JSON-LD to:
[handle]/index.astro[handle]/p/[projectSlug]/index.astro[handle]/[slug].astro
- Add normalized redirect logic for handle/slug variants.
- Extend public project API to include owner summary in one response.
- Add pagination params to public profile/project APIs.
- Add FastAPI compression and response cache headers for public endpoints.
If you execute Phases 0 and 1 completely, you should see meaningful crawl/index quality improvement quickly. Phases 2 and 3 are where long-term SEO + GEO compounding happens.
Top comments (1)
Thorough breakdown. A few specific points worth flagging from the implementation plan:
On the blog cluster strategy (section 6): the "pillar <-> supporting posts bi-directional linking" you describe is exactly right, but it's worth thinking about how you surface those links programmatically as the blog grows beyond 20 posts. Manual curation works at 20 posts; at 100+ it becomes stale fast. The pattern that scales well is scoring-based: compute a relevance score between any two posts by weighting tag overlap at ~70% and title keyword similarity at ~30%, then surface the top N results as related posts. That way new posts automatically get linked to existing ones without anyone having to remember to update old articles. (This is something we built into our Shopify app Better Related Blog Posts — apps.shopify.com/better-related-blog-posts — for the same reason: manual related post lists rot.)
On the GEO layer (section 3.4): the /llms.txt approach is good, but the Q&A blocks on key pages are actually more impactful for AI citation than llms.txt in current practice. AI assistants are pattern-matching on structured question+answer text directly from the page, not from llms.txt declarations. The speakable schema markup (which you don't mention but is adjacent to what you're doing) is worth adding alongside the Q&A blocks — it explicitly signals to AI engines which sections of a page are meant to be read aloud or cited.
On Phase 0 priority: I'd sequence the robots.txt + sitemap before the metadata implementation, not after. Without a sitemap, Googlebot can't efficiently discover the normalized URLs you're building, which means the canonical work you do in Phase 1 is invisible until the next crawl cycle anyway.