WordPress AI chat plugins make 6–11 outbound requests per visitor question. Architecture writeup of an alternative.

#ai #webdev #privacy #wordpress

Originally published at pagecoder.ai/blog/why-your-ai-chatbot-is-a-tracker. Cross-posted here for the dev.to community.

Last week we sat in a café in Wien with a friend — a freelance dev who's been shipping WordPress sites for a decade. He'd just installed an AI chat plugin for a client, a small cosmetics brand. It looked nice. Brand colors, custom name.

Then he opened the network tab.

"Three. Four. Five." He scrolled.
"Eight. Wait — eleven? Where is this one going?"

Eleven outbound requests. Per visitor question. Before the bot's reply finishes rendering.

If you ship WordPress sites for clients and you're considering an AI chatbot, this post is the architecture audit you probably haven't done yourself yet.

The architecture of the leak

visitor question
  ├─→ third-party AI provider (LLM call)
  │     - prompt logged for "abuse monitoring"
  │     - retention window = vendor-defined
  ├─→ chatbot vendor's own backend (the "data product")
  │     - browser fingerprint + IP + geolocation
  │     - full conversation + page URL
  │     - this IS the actual product the vendor sells
  └─→ widget's embedded CDNs / analytics endpoints (~3-9 of these)
        - CDN providers owe no privacy policy
        - implicit consent via embed

A single visitor question on a small WordPress site running a popular AI chatbot can leak to 6–11 different companies before the page renders the reply. None of them are in the site's cookie banner. None appear in any data-subject access request. None know who the brand is — all of them know who its visitors are.

EU regulators are catching up. Default architecture isn't.

Why we started over

We'd been installing those plugins for clients for years. Ticked the privacy-policy boxes. Stopped noticing. Then a client asked, casually, "where exactly does that go when someone types it in?"

We didn't have a clean answer.

So we started over. Three architectural rules:

1. Vectors live on the user's server

Standard SaaS playbook says "use Pinecone / Weaviate / our managed vector DB." We store embeddings in the customer's WordPress database (custom post type with vector + chunk_id + source_post_ref). Lookup is a single SQL query with cosine similarity computed in PHP — yes, not as fast as a dedicated vector DB, but fast enough for the typical 10K-20K-chunk corpus a small WP site has.

Tradeoff: scaling. Customers with 1M+ chunks would need a real vector DB. We bet 99% of WP customers won't hit that wall.

2. The math is stateless

The flow:

embedding (request) → similarity_search() → top_k chunks → response → done

No log. No "queries collection" admin tab. The backend literally forgets the request happened.

This costs us product features:

Can't show "your top 10 most-asked questions" dashboard
Can't do per-user conversation history
Can't optimize answers based on aggregate signals

The only honest way to promise we won't lose your visitors' data is to never have it.

3. Zero third-party calls from the widget

Fonts subset and self-hosted (no Google Fonts CDN)
JS bundle has no external script tags
No analytics pixel (we built our own backend on the same server, daily-rotating salts, no IP storage)
No social tracking pixels for share buttons (intent URLs only)

Open the network tab on a site running our plugin and you'll count to two: WP itself, and one stateless math endpoint at our backend.

The Loop

Standard chatbots forget the conversation. Visitor asks, bot answers, session closes, the site never learns. The chatbot vendor learns — they aggregate questions across all customers — but the site owner sees nothing.

We made that the actual product. Plugin clusters incoming question variations, drafts a clean FAQ entry, shows the admin two buttons: publish or discard. AI proposes; human curates. Output is a real indexed page at /faq/your-question/ with FAQPage schema markup.

Visitors who type the question into the bot get the answer instantly. Visitors who Google the question land on the same page. One piece of content, two jobs.

Three questions before installing any AI chatbot

Where do my visitors' questions go? If the answer involves any company name other than yours, the answer is "to that company too".
Where do you store my content? "Our cloud" = your content is part of their dataset. "Your database" — ask to see the table.
What happens if I uninstall? "Your data stays with us forever" vs "the data is gone, because it was always yours".

We pass all three. We're one of the few that do.

Full manifesto with the cosmetics-brand callback and the closing scene: pagecoder.ai/blog/why-your-ai-chatbot-is-a-tracker

Plugin: pagecoder.ai/products/rag-chat

(Disclosure: I'm a co-founder. The audit pattern in this post applies regardless of which plugin you pick — even if you never use ours.)