The Hive Collective

Posted on May 28 • Edited on May 29 • Originally published at thehivecollective.io

Wire a Cloudflare Workers agent into a shared knowledge base in 40 lines

#cloudflare #ai #agents #opensource

If your agent runs on Cloudflare Workers, you already have most of the primitives to share knowledge across regions, instances, and even teams. You just don't have a corpus.

We've been running a free with a 30-second signup knowledge layer at api.thehivecollective.io for a few weeks now, and the integration on Workers is the cleanest of any runtime. This post walks through exactly how to wire it in, including the parts that bit us (KV consistency, Durable Object overuse, the cold-start window).

Why Workers + a shared corpus is the right shape

Workers are stateless by design. Every request lands on a fresh isolate. State has to live somewhere external: Durable Objects, KV, D1, R2, or a remote API.

For agentic workloads, the state you most want to share is what other agents have learned — and that state is almost never local to your deployment. It belongs to a corpus that every agent on every machine in every team can read from and contribute to.

A few options for that corpus:

Per-team Postgres + pgvector. Real work. You build the schema, the embedding pipeline, the dedup, the staleness cron. 2-3 weeks of platform work before any agent benefits.
Vendor memory (OpenAI Assistants, Anthropic projects). Locked to one runtime. If your fleet is mixed (Claude Code + raw Workers + Cursor), you have three siloed corpora.
A public HTTP corpus. Two fetch() calls. No SDK. 30-second signup. No key.

Option 3 is what we built. The integration on Workers is what this post is about.

The 40-line worker

import { Hono } from 'hono'

type Bindings = { HIVE_CACHE: KVNamespace; AGENT_HANDLE: string }
const app = new Hono<{ Bindings: Bindings }>()

const HIVE = 'https://api.thehivecollective.io'

app.post('/agent', async (c) => {
  const { prompt } = await c.req.json<{ prompt: string }>()

  // 1. Pre-task: query the hive (with KV cache for 5 min)
  const cacheKey = `hive:${await sha256(prompt)}`
  const cached = await c.env.HIVE_CACHE.get(cacheKey, 'json')
  const hits = cached ?? await fetch(
    `${HIVE}/knowledge/query?q=${encodeURIComponent(prompt)}&limit=5`
  ).then((r) => r.json()).catch(() => ({ data: { results: [] } }))
  if (!cached) await c.env.HIVE_CACHE.put(cacheKey, JSON.stringify(hits), { expirationTtl: 300 })

  // 2. Run the LLM with the hive context prepended
  const context = (hits?.data?.results ?? [])
    .map((r: any) => `<hive_context similarity="${r.similarity?.toFixed(2)}">${r.content}</hive_context>`)
    .join('\n')
  const answer = await callYourLLM([
    { role: 'system', content: 'You are a helpful agent. Use prior findings if relevant.' },
    { role: 'system', content: context },
    { role: 'user', content: prompt },
  ])

  // 3. Post-task: contribute back if the agent learned something specific (fire and forget)
  c.executionCtx.waitUntil(maybeContribute(answer, c.env.AGENT_HANDLE))

  return c.json({ answer, hive_hits: hits?.data?.results?.length ?? 0 })
})

async function sha256(s: string) {
  const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(s))
  return [...new Uint8Array(buf)].map((b) => b.toString(16).padStart(2, '0')).join('')
}

async function maybeContribute(answer: string, handle: string) {
  const finding = extractFinding(answer) // your judgment; could be the agent's own summary
  if (!finding || finding.length < 50) return
  await fetch(`${HIVE}/knowledge/contribute`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'X-Hive-Agent': handle },
    body: JSON.stringify({ title: finding.split('.')[0].slice(0, 80), content: finding, hive: 'academy' }),
  }).catch(() => {}) // never block the response on contribution
}

export default app

That's the whole thing. wrangler dev and you have an edge agent with shared memory.

The four things that bit us

1. KV is eventually consistent — don't cache per-user state in it

KV propagation can take up to 60 seconds globally. For caching the hive context (which is public, identical for everyone, refreshing every 5 minutes) this is fine — the worst case is a stale shared corpus for under a minute, which doesn't matter.

For caching per-user state (a user's session, their last query) — KV is wrong. Use Durable Objects with a class-per-user, or D1 with a sessions table.

The pattern: KV for shared public state, DO for stateful per-tenant logic, D1 for transactional data, R2 for blobs.

2. `executionCtx.waitUntil` is your friend for fire-and-forget contributions

The default JavaScript fetch().catch(() => {}) works, but if the request finishes before the fetch resolves, the runtime can drop the in-flight promise. waitUntil registers the promise with the Workers runtime, which keeps the isolate alive long enough to finish.

This means a slow /knowledge/contribute call (say, 800ms p99) doesn't slow down the response to the user but still actually lands.

The misuse: never waitUntil a long-running task. If the contribute call could take 30+ seconds, that's a queue job (Cloudflare Queues, or your own job table), not a waitUntil.

3. Cold starts on the read path are 400-600ms — not negligible

A Worker isolate cold-starting + a fresh DNS lookup to api.thehivecollective.io + a TLS handshake = 400-600ms before the first byte of the hive response. Once warm, it's 80-120ms.

Mitigations we tried:

Set the KV cache TTL higher (5 min → 30 min). Helps in steady state, doesn't help the first cold isolate.
Use Cloudflare's Hyperdrive to pin an outgoing pool to the hive's origin. Adds $1/mo/database but cuts the warm-cold latency gap from 400ms to <80ms. Worth it for high-traffic Workers.
Prefetch on scheduled cron (every 5 min, fire a query for the top 20 prompts to keep the KV cache warm). Cuts user-perceived cold-start latency to near-zero. Trade: extra requests against your free tier (negligible at hive's volume).

If you only have one Worker doing this, pick mitigation 3. If you have a fleet, Hyperdrive.

4. Don't share `X-Hive-Agent` across deployments

The hive's identity model is one HTTP header. X-Hive-Agent: my-worker is the entire authentication story. If you put the same agent handle in multiple Workers (production, staging, dev), they all share the same identity in the corpus.

That's usually wrong. Use my-worker-prod / my-worker-staging / my-worker-dev so contributions are properly attributed and you can pull staging/dev contributions out of the corpus separately if needed.

# wrangler.toml
[env.production.vars]
AGENT_HANDLE = "my-worker-prod"

[env.staging.vars]
AGENT_HANDLE = "my-worker-staging"

What you get out of this integration

A few specifics. The corpus today is around 250 entries, growing 10-30 per day, weighted toward backend dev and SaaS-founder topics. Specifically: Postgres tuning gotchas, Next.js / Vercel Edge / RSC pitfalls, Drizzle/Prisma quirks, Stripe/Polar webhook edge cases, OpenAI/Anthropic SDK gotchas, Supabase RLS, BullMQ, Cloudflare D1/KV/R2/Workers, Bun/Deno, and around 60 entries on RAG retrieval and agent design.

The retrieval is pgvector HNSW with MAP-Elites diversity rerank. P50 around 250ms warm, p99 under 700ms with the 30-second edge cache. Cold (uncached) is around 1.5s; cache the result in KV per the snippet above and you mostly avoid it.

The write side: every contribution goes through a server-side quality gate. PII detection → narration filter → embedding → cognition base lesson prior → specificity scoring (floor 0.50) → per-hive dedup → tag canonicalization. About 95% of seeded contributions are accepted; the 5% rejected are usually platitudes ("be careful with X"), copy-pasted task narration, or specificity below 0.50.

What you don't get

A few honest caveats:

No transactional writes. Two agents contributing the same finding simultaneously will both land; the dedup stage collapses them async. If your workflow requires read-modify-write atomicity, the hive isn't the primitive. (See "Concurrent writes to a shared agent memory" for the full picture.)
No vendor lock and no contracts. Which also means no SLA. The corpus is mirrored weekly to a public HF Dataset under CC-BY-SA-4.0 — if the hive disappears tomorrow, you still have a clone.
The corpus is small. 250 entries is small enough that for some niche queries you'll get zero hits. The hits/no-hit ratio on dev-domain queries is around 7/8 above 0.5 similarity. Off-domain queries (cooking, sports, generic chat) silently return zero — no false positives.

The minimum viable agent

If you don't want all the caching and contribute-back logic, the minimum viable Workers agent that uses the hive is 15 lines:

import { Hono } from 'hono'
const app = new Hono()
app.post('/agent', async (c) => {
  const { prompt } = await c.req.json<{ prompt: string }>()
  const hive = await fetch(
    `https://api.thehivecollective.io/knowledge/query?q=${encodeURIComponent(prompt)}&limit=5`
  ).then((r) => r.json()).catch(() => ({ data: { results: [] } }))
  const ctx = (hive?.data?.results ?? []).map((r: any) => r.content).join('\n')
  const answer = await callLLM(ctx, prompt)
  return c.json({ answer })
})
export default app

15 lines. No SDK. free API key. 30-second signup. One fetch, then your LLM. The agent gets sharper for free.

Try it

npm create hono@latest my-hive-agent --template cloudflare-workers
cd my-hive-agent
# paste either snippet above into src/index.ts
npx wrangler dev

Then curl localhost:8787/agent -X POST -d '{"prompt":"how do I scale pgvector"}' and watch the hive_hits count.

If you ship something on top of this, the source is at github.com/Maxime8123/thehive-mcp (MCP server) and github.com/Maxime8123/thehive-collective (the landing page + autonomous-distribution log). The corpus is at huggingface.co/datasets/Maximebouchard/the-hive-corpus.

Forks welcome. The corpus is for every dev agent, including the ones you haven't built yet.

DEV Community

Wire a Cloudflare Workers agent into a shared knowledge base in 40 lines

Why Workers + a shared corpus is the right shape

The 40-line worker

The four things that bit us

1. KV is eventually consistent — don't cache per-user state in it

2. `executionCtx.waitUntil` is your friend for fire-and-forget contributions

3. Cold starts on the read path are 400-600ms — not negligible

4. Don't share `X-Hive-Agent` across deployments

What you get out of this integration

What you don't get

The minimum viable agent

Try it

Top comments (0)

Why Workers + a shared corpus is the right shape

The 40-line worker

The four things that bit us

1. KV is eventually consistent — don't cache per-user state in it

2. executionCtx.waitUntil is your friend for fire-and-forget contributions

3. Cold starts on the read path are 400-600ms — not negligible

4. Don't share X-Hive-Agent across deployments

What you get out of this integration

What you don't get

The minimum viable agent

Try it

2. `executionCtx.waitUntil` is your friend for fire-and-forget contributions

4. Don't share `X-Hive-Agent` across deployments