I replaced my $3k/mo SEO stack with one content pipeline — here's the architecture

#seo #ai #saas #webdev

For about a year I ran SEO the way most small teams do: a copywriter on retainer, an Ahrefs seat, and a graveyard of half-finished keyword spreadsheets. It cost me roughly $3k/month and produced maybe four articles. The bottleneck was never writing — it was everything around the writing.

So I did what developers do when a manual process gets annoying enough: I treated it as a pipeline and started automating the boring parts. This post is the architecture of what that turned into — and the product it became, SeoSync.

If you've ever stared at a content calendar wondering why "publish two blog posts" takes two weeks, this one's for you.

The actual problem isn't writing — it's the graph around it
When you decompose "publish an SEO article," writing is maybe 20% of the work. The rest looks like this:

keyword discovery
→ intent classification (is this even rankable?)
→ SERP analysis (what already ranks, and why)
→ clustering (which queries are the same article?)
→ outline that matches search intent
→ draft in the site's voice
→ meta tags + schema
→ internal links to/from existing posts
→ images
→ publish to the CMS
→ repeat, on a schedule, forever
Every one of those steps is a place where a human gets stuck, context-switches, or just quietly gives up. A language model can write a paragraph, but a paragraph was never the hard part. The hard part is the graph: knowing which 1,400 keywords collapse into 40 articles, and how those 40 articles should link to each other.

Step 1 — Keyword research that filters for winnable queries
Most keyword tools dump volume at you. Volume is a vanity metric if you can't rank. The interesting signal is the combination of intent + difficulty + your site's current authority.

The keyword research layer pulls the raw query universe (search suggestions, related terms, competitor overlap), then scores each query for can this specific site realistically reach the top? — not just "how many people search it." Low-difficulty, high-intent long-tail queries get prioritized; vanity head terms get parked.

For developers: think of it as a ranking function over the keyword universe, where the features are KD, intent class, and a site-authority prior — not a raw ORDER BY volume DESC.

Step 2 — SERP-overlap clustering (the part everyone skips)
Here's the step that separates a real content engine from "ChatGPT, write me a blog post."

Two keywords belong in the same article if Google (or Yandex) returns roughly the same set of URLs for both. That's it. If the top-10 results overlap heavily, users treat them as the same question — so you should too.

conceptual: cluster by SERP overlap, not string similarity

def same_article(q1, q2, threshold=0.4): top_a = top_urls(q1) # live SERP, top 10 top_b = top_urls(q2) return jaccard(top_a, top_b) >= threshold
String similarity ("best running shoes" vs "top running shoes") is a trap — it merges things Google considers different and splits things Google considers identical. SERP overlap reflects how the search engine actually groups intent. Run it across your whole keyword set and the clusters fall out naturally. Each cluster = one article with a clear target.

Step 3 — Drafting in the site's voice, not a generic one
Once you have a cluster and an intent-matched outline, drafting is the easy part — if you constrain it properly. A raw model writes hype ("The Ultimate Best Top Guide"); a useful AI SEO content writer writes something a human would actually finish reading.

The constraints that matter in practice:

Analyze the existing site first to lock tone, structure, and topical scope before generating a single word.
No superlatives in headings (H2: "Best..." / "Top...") — bad for SEO, breaks the table of contents, reads like spam.
Ground claims in the SERP — pull what already ranks so the draft answers the real question instead of hallucinating around it.
Be conservative on YMYL (medical/financial) — that's where thin AI content actually gets penalized.
The model is used for judgment calls (drafting, summarizing, classifying intent). Everything deterministic — routing, retries, the clustering math — stays in code. LLMs are bad at being databases; don't make them one.

Step 4 — The internal-link graph
This is my favorite part because it's pure graph theory wearing an SEO costume.

A new article shouldn't link randomly — it should link to published siblings in the same cluster and back up to its pillar page. As the corpus grows, the link graph has to update existing posts too, so older articles point forward to the new one. Do this well and you build topical authority; do it manually and you simply never do it.

pillar: "SEO automation"
├── spoke: "SERP clustering" ⇄ links to sibling spokes
├── spoke: "internal linking" ⇄
└── spoke: "keyword difficulty" ⇄
The whole thing is a hub-and-spoke graph that rebalances on every publish. That's not something a human edits by hand across 200 posts — it's a job for code.

Step 5 — Publishing without a human in the loop
The last mile kills most automation projects: getting the finished article into the CMS. SeoSync ships a WordPress integration (plus Shopify, Webflow, Wix, Framer) so the pipeline pushes the post — meta tags, schema, images, internal links and all — straight to the live site on a schedule. No copy-paste, no "I'll publish it tomorrow."

Connect a site, and it analyzes the existing pages, generates a 90-day plan from real queries, then writes and publishes daily while you do literally anything else.

Does Google actually penalize this?
The honest answer: Google penalizes useless content, not automated content. Those aren't the same thing. If each article is semantically unique, answers a real query, follows search guidelines, and ships clean to the site — the crawler treats it like any other post. The failure mode is spam and thin YMYL claims, which the pipeline is specifically designed to avoid. (This is the same reason SEO automation works in 2026 when it didn't in 2016 — the bar is quality, and you can now hit that bar at volume.)

The compounding side effect: links and AI visibility
Two things I didn't expect:

Backlink exchange as a built-in step — vetted, topical partner links placed automatically as the corpus grows. Domain Rating moved more in six weeks than it had in the prior year.
AI search visibility. Once you have broad, well-structured topical coverage, ChatGPT and other assistants start citing you in answers. It's basically SEO for the LLM era, and it falls out of the same pipeline for free.
What it actually replaced
The four-figure monthly stack — copywriter retainer + enterprise SEO tools — collapsed into one pipeline. The numbers on my own sites: organic up meaningfully inside the first 1–2 months, sustained cluster growth over the following quarter. The pricing ended up an order of magnitude below what the manual stack cost, which still feels slightly unfair.

Takeaways if you're building something similar
Model the work as a graph and a pipeline, not a prompt. The prompt is the smallest part.
Cluster by SERP overlap, not string similarity. This is the single highest-leverage idea here.
Keep the LLM on judgment calls only. Determinism belongs in code.
The internal-link graph compounds. Automate it from day one or you'll never do it.
Quality is the moat, not volume — but you can now have both.
If you'd rather not build all of this yourself, that's exactly what SeoSync is. And if you do build your own — I'd genuinely love to read that dev.to post.

What part of your content workflow is still stubbornly manual? Drop it in the comments — the clustering step surprised me most, curious what trips up others.

DEV Community

I replaced my $3k/mo SEO stack with one content pipeline — here's the architecture

conceptual: cluster by SERP overlap, not string similarity

Top comments (0)