DEV Community: 강해수

1024-token RAG chunks cut my storage cost in half — and nearly doubled my Claude bill

강해수 — Wed, 01 Jul 2026 05:29:34 +0000

Switching from 512 to 1024-token chunks saved $1.20/month on Vectorize. It cost me $92 more on Claude Sonnet. I didn't see that coming until I did the math.

I run an ad analytics SaaS with a daily agent flow that hits a RAG step on every cycle — about 400 runs a day. I'd left the chunk size at the default 512 tokens for three months before I got curious enough to actually measure it. So I indexed the same 100 ad reports three ways (256, 512, 1024 tokens), ran 20 fixed queries five times each, and tracked latency, citation accuracy, and estimated monthly cost.

The summary table looked like 512 wins cleanly:

Chunk size	Avg latency	Citation accuracy	Monthly vector cost
256 tokens	43ms	16.2 / 20	~$4.80
512 tokens	51ms	17.8 / 20	~$2.40
1024 tokens	67ms	15.1 / 20	~$1.20

But the vector storage number is a trap. With 1024-token chunks, each top_k: 5 retrieval pulls ~5,120 tokens into Claude's context instead of ~2,560. At $3/M input tokens, that's roughly 2,560 extra tokens × 12,000 monthly calls = 30.7M tokens = $92/month. The $1.20 Vectorize saving doesn't touch it.

1024-token chunks also produced what I'd call dilution rather than hallucination — Claude wasn't making things up, it was including too much surrounding text and missing the actual point. One campaign's 60-day performance data would land as a single dense chunk, and the model would surface averages instead of the anomaly I was asking about. Smaller chunks hurt in the opposite direction: a campaign summary split across three 256-token pieces meant top_k: 3 often came back with an incomplete picture.

What I run now is two indexes on the same data — a report-512 namespace for accuracy-first summary queries and a live-256 namespace for latency-sensitive ad-hoc questions:

const summaryChunks = await vectorize.query(embedding, {
  topK: 5,
  namespace: 'report-512',
});

const liveChunks = await vectorize.query(embedding, {
  topK: 3,
  namespace: 'live-256',
});

Double the storage cost, but the Claude savings on the live path more than cover it at my call volume. Whether that math holds at lower volumes — it probably doesn't.

I also hit a production incident during this experiment that had nothing to do with chunk size: swapping embedding models mid-index caused a dimension mismatch that killed the entire RAG step at 9am. That failure and the chunk overlap experiments I haven't run yet are in the full writeup.

I wrote up the full breakdown — including the dimension mismatch incident and what I'd test next with dynamic chunking by document type — over on riversealab.com.

Full post →

My Durable Object processed 4 req/s instead of 40 — the culprit wasn't storage

강해수 — Wed, 01 Jul 2026 05:26:07 +0000

A 200ms outbound webhook call was serializing every single request through my Durable Object, and I spent the first hour blaming the wrong thing.

Durable Objects enforce a strict execution model: one fetch handler runs at a time. If a second request arrives while the first is still awaiting anything — storage, a network call, a sleep — it queues behind it. That's the consistency guarantee, and it's intentional. What I missed is that the queue doesn't care what you're awaiting. A storage.put() that takes 5ms and an outbound fetch() that takes 300ms both hold the same lock. At 10 concurrent callers, you're not running 10 operations in parallel — you're running them in a single-file line, each one waiting for the full execution time of the one ahead of it.

My DO was flushing a write buffer: save to storage, then POST to a webhook. Under load during a campaign spike (12K writes/minute), wrangler tail started showing queue depth errors. I assumed KV back-pressure — I've hit the ~1,000 writes/second namespace cap before — so I rewrote the buffer to batch puts. Throughput improved maybe 15%. Still serialized. A quick timer log inside the handler told the real story:

const t1 = Date.now();
await this.state.storage.put("lastSeen", t1);
console.log(`storage.put: ${Date.now() - t1}ms`); // 3–8ms

const t2 = Date.now();
await fetch("https://hooks.example.com/webhook", { method: "POST", body: JSON.stringify({ ts: t1 }) });
console.log(`outbound fetch: ${Date.now() - t2}ms`); // 180–420ms

The fix was architectural, not a micro-optimization. The DO should own state, not side effects. I moved the webhook call to a Queue binding — env.WEBHOOK_QUEUE.send(body) runs in under 5ms and doesn't block on consumer acknowledgment. The DO drops the payload and moves on immediately. Lock held for single-digit milliseconds instead of 400.

The second part of the fix — parallelizing read-only storage calls with Promise.all() instead of sequential await chains — shaved another chunk off p95 latency and is worth knowing about even if you never touch a webhook.

I wrote up the full breakdown — including the Promise.all() read pattern, what the input gate actually controls in the DO event loop, and how to test serialization behavior locally with wrangler dev — over on dailymanuallab.com.

Full post →

Adding one field to Notion cost me 2.5 hours. The same change in Tana took 30 seconds.

강해수 — Wed, 01 Jul 2026 05:24:04 +0000

Adding a single property to a live Notion database with 160 rows isn't a five-minute job — it's a backfill session. I learned this the hard way in week seven of running Notion as production infrastructure for a content pipeline shipping 40 pieces a month.

I added a "Distribution Channel" property mid-project because a client requirement shifted (they always do). Notion has no default inheritance for existing records. Every row showed blank in the rollup that referenced the new field until I manually touched it. Two and a half hours of cleanup for one schema change. And that cost resets every time the schema evolves — which, if your clients are real humans with drifting requirements, is constantly.

The same change in Tana took about 30 seconds. I added a "Budget Flag" field to my Campaign supertag in month three. Every existing Campaign node inherited it immediately with a null value. No backfill required. That's not a minor UX difference — it's a fundamentally different data model. Tana's supertags propagate field definitions forward and backward across all tagged nodes. Notion's database columns are static per-row until you intervene.

Here's the trade-off that actually matters after four months of tracking 41 friction events across both tools:

Condition	Notion	Tana
Schema stable, team needs access	Holds up	Breaks down
Schema evolves frequently	Painful	Fine

Tana's collaboration model was the wall I hit hard. Two contractors needed read access to brief statuses. Tana's sharing wasn't built for that workflow — at least not during the period I was running this. I ended up exporting pipeline status to a shared Notion page daily. Two tools doing one job, which is its own kind of friction.

The honest framing isn't which tool is better. It's which failure mode is cheaper for your specific situation. Schema instability has a price in Notion that nobody in the "just plan ahead" crowd accounts for honestly.

I wrote up the full breakdown — including how the Zapier-to-Notion CRM sync held up at 300 entries a month, and the exact point where Tana's live search replaced three separate Notion databases — over on dailyfocusmag.com.

Full post →

My best-looking ROAS campaigns were quietly destroying subscription revenue

강해수 — Wed, 01 Jul 2026 05:22:28 +0000

Campaigns with the cleanest ROAS dashboards had collapsed subscription attach rates — from 40% down to under 12% — and nobody noticed for weeks.

Here's what happened: subscription checkouts and one-off purchases were firing into the same Purchase event, feeding a single tROAS target. The algorithm did exactly what it was told. It found conversions at the target ROAS, and the cheaper, more abundant one was the one-off buyer. Subscription LTV over 12 months in these accounts ran 3.5x–6x the one-off AOV for the same SKU. A ₩29,000/month skincare subscription is worth ₩348,000 in year one. The ₩38,000 one-off buyer is done. Blended into one ROAS target, the platform has no mechanism to weight them differently — so it doesn't.

The fix is structural, not a settings tweak. Two separate conversion actions, two separate campaign containers. On Meta, splitting at the ad set level inside one campaign doesn't hold — the delivery system still blends optimization signal. The separation only works cleanly at the campaign level. On Google PMax, two separate campaigns with separate conversion goals assigned. I also tested value rules as a shortcut for six weeks: they adjust reported value but the underlying audience signal the model uses for prospecting still treats both buyer types as the same conversion. Subscription rate didn't move.

The part that actually makes bid separation work — and where most implementations stop short — is what value you pass as the conversion signal. Checkout revenue is the wrong number for a subscription. I fire a separate Subscribe_Complete event with a value parameter set to projected 6-month LTV, not the transaction amount, via Cloudflare Workers intercepting the post-purchase webhook within 2 seconds of checkout. That's what the tROAS target is actually bidding against.

One caveat worth naming: this only makes sense when the LTV gap is real. If subscription LTV is only 1.5x one-off AOV, the volume fragmentation from separation probably costs more than the targeting precision gains. The threshold I use is 2.5x. Below that, consolidation may genuinely be the right call.

I wrote up the full breakdown — including the exact 3-check decision flow I run on day 3 of every new campaign, and a simpler fallback for teams that can't build the webhook infra yet — over on themedilog.

Full post →

My agent dry-ran fine in staging 100 times — then wrecked production on the first real run

강해수 — Wed, 01 Jul 2026 01:12:19 +0000

A staging-to-production data bleed cost me 4 hours of rollback. That's what finally made dry-run a structural requirement, not an afterthought.

The common advice is: test in staging, promote when green. The problem is environment drift. My D1 schema changes once or twice a week, and a solo operator can't keep staging perfectly synchronized. Worse, agents don't have fixed execution paths — the same input can produce a different tool call sequence on the next run. I ran a flow 100 times in staging and still hit a fresh path on the first production execution.

The most surprising thing I learned after 6 months of running this: latency wasn't the problem I expected. KV writes averaged 12ms — basically imperceptible. The real problem was that mock responses fool the agent into treating skipped writes as real successes. I'd dry-run an R2 put, the agent would believe the file was uploaded, and then proceed to write metadata to D1 — which was not in dry-run scope. Real write, orphaned record.

The fix: once any write tool in a run hits dry-run, propagate a flag for that runId that forces all subsequent writes in the same run to dry-run too.

// after intercepting first dry-run write
await ctx.env.KV.put(`dryrun_active:${ctx.runId}`, "1", {
  expirationTtl: 3600,
});

// every subsequent hook checks this flag
const isDryRunActive =
  (await ctx.env.KV.get(`dryrun_active:${ctx.runId}`)) === "1";

One more thing that burned me: if the hook itself fails — say, KV goes temporarily unavailable — Claude Code's default behavior is fall-through. The tool call executes anyway, dry-run flag ignored. Last week a KV spike caused hook timeouts and 3 agents wrote directly to production. No data loss because those ops were idempotent, but it was luck. Hook failure needs its own alert, separate from agent failure.

I wrote up the full breakdown — including the dry-run propagation edge cases, R2 + D1 orphan scenarios, and where this pattern completely falls apart (read-modify-write loops, APIs with side-effectful reads) — over on riversealab.com.

Full post →

`wrangler dev --remote` silently writes to your production KV namespace — here's the fix

강해수 — Wed, 01 Jul 2026 01:08:50 +0000

I lost production data on a Tuesday afternoon because wrangler.toml had one missing field. Not a code bug. Not a logic error. A missing preview_id.

By default, wrangler dev uses a local SQLite simulation — safe, isolated, zero real traffic. The moment you add --remote, every KV read and write goes to the actual Cloudflare namespace over the API. If your wrangler.toml only has the id field pointing at your production namespace, those writes land in prod. No warning. No confirmation prompt. Just silent data mutation on the namespace your live users depend on.

The fix is a single extra field:

[[kv_namespaces]]
binding = "MY_STORE"
id = "PROD_NAMESPACE_ID_HERE"
preview_id = "DEV_NAMESPACE_ID_HERE"

Wrangler automatically routes --remote traffic through preview_id instead of id. Create a separate dev namespace with wrangler kv namespace create "MY_STORE_dev", drop its ID into preview_id, and your production namespace is untouched. This should probably be in the quickstart docs. It isn't, at least not prominently.

The second thing worth knowing: --remote exposes a behavioral gap that local simulation hides entirely. Local KV is synchronous and in-process — a put() followed by a get() on the same key always returns the fresh value. Remote KV is eventually consistent. I had a rate-limiting worker that looked completely broken under --remote: I'd write a counter, immediately read it back, and get the old value. The worker was correct. The local simulation had been lying to me about how production actually behaves. Switching to --remote (against a dev namespace, not prod) surfaced the real race condition. That's uncomfortable, but it's accurate.

There's also a write-rate ceiling worth knowing before you run any kind of seed script: hit roughly 1,000 writes/minute and you'll start seeing 429 Too Many Requests with error code 10013. A 70ms sleep between writes keeps you under the limit without dramatically slowing a seed operation down.

I wrote up the full breakdown — including the wrangler tail JSON truncation trap that cost me two hours, a shell script for seeding a dev namespace with representative data, and the exact cacheTtl: 0 pattern for honest read behavior — over on dailymanuallab.com.

Full post →

I audited 340 reading captures. Only 20% ever became knowledge I actually used.

강해수 — Wed, 01 Jul 2026 01:07:02 +0000

Out of 340 captures over 90 days — Readwise highlights, Obsidian quick-captures, browser bookmarks — exactly 68 ever became a note I actually used. That's a 20% completion rate. The other 272 had a timestamp and nothing else.

The uncomfortable part wasn't the number itself. It was what the data said about where things died. I assumed the bottleneck was my weekly review — not thorough enough, not consistent enough. Wrong. I tagged every capture for three weeks as either "capture-only" or "annotated-at-source." Capture-only items had an 8% chance of becoming a usable note. Items where I spent 90 seconds writing a single sentence — in my own words, not a copied highlight — completed at around 55%. The weekly review wasn't failing because I was bad at reviews. It was failing because context decays faster than a week.

By Sunday, I genuinely couldn't remember why I'd saved half the items. A highlighted paragraph looked important. The argument I was building when I saved it was gone. So I'd either re-read the source (expensive) or archive without processing (wasteful). The knowledge was perishable in a way that tasks simply aren't. What fixed it wasn't a better review template — it was a same-session annotation rule. I won't close a tab after reading something capture-worthy unless I've written one sentence into my Obsidian daily note first. The sentence doesn't have to be good. It has to be mine. Sixty to ninety seconds. That single constraint moved my annotation rate more than four months of Sunday review slots ever did.

The other piece that made this stick was a decay-date field in Obsidian Dataview — something I hadn't seen anyone write about before I built it. Every live annotation gets a date set 14 days out. A query surfaces anything expiring within 3 days. If it hasn't been promoted to a permanent note by then, a Templater script archives it automatically. Not deleted. But gone from the active workspace. The deadline is visible. The loss is real but low-stakes. It created a forcing function the inbox folder never could.

I wrote up the full breakdown — including the harder completion metric I'm now using (a capture only "completes" when it gets cited in something I shipped, not when it becomes a note) and the exact Dataview query setup — over on dailyfocusmag.com.

Full post →

Adding more Claude subagents made my pipeline slower — here's the specific reason why

강해수 — Mon, 29 Jun 2026 05:37:12 +0000

Scaling from 4 to 8 Claude Code subagents pushed my error rate from 0.8% to 4.3%. The bottleneck wasn't the model.

The culprit was a stateful MCP tool called analytics_query that held pagination cursors, mid-aggregation values, and filter chains in instance memory between calls. Cloudflare Workers routes each request to whichever PoP instance is handy — no guarantees you land on the same one twice. At 4 subagents, collisions were rare enough that sessions accidentally stayed sticky. At 8, the distribution spread out and context misses went nonlinear. The error looked like this:

Error: Tool call failed — session context not found
  session_id: "sess_7f3a9b"
  worker_instance: "worker-11"
  expected_instance: "worker-04"

The session ID existed. The worker didn't match. State was gone.

I ran two fixes side by side. KV-based session storage (serialize the whole context, read at call start, write at call end) solved the routing problem but created a new one: at 8 concurrent subagents, KV writes multiplied to ~16x my estimate. Under load, p99 latency jumped from 180ms to 620ms per tool call, and the write cost alone crossed $150/month at my volume.

Durable Objects solved it cleanly. Route by session ID and you always hit the same DO instance — session affinity handled at the platform level, not in my code. Same load, p99 dropped to 38ms. Monthly cost settled around $40–60.

The tradeoff nobody mentions upfront: DO instances get evicted on idle, and when that happens the in-memory state silently vanishes. The agent has no idea and keeps going. That failure mode is quieter and scarier than KV latency spikes, which at least show up in dashboards immediately.

What I landed on after six months: DO memory for active sessions, DO Storage checkpoints at the end of each tool call (~$10/month extra), and KV only as a routing index — read-heavy, nearly free. Three layers, but each one has a distinct failure mode you can actually isolate.

The 6-subagent mark was my inflection point. Below it, you might not see this problem at all. Above it, the session collision math gets ugly fast.

I wrote up the full breakdown — including the checkpoint timing problem (DO idle eviction is less predictable than the docs suggest) and what happens when multiple subagents hit the same session simultaneously — over on riversealab.com.

Full post →

56% of my notes died before I ever had a chance to retrieve them

강해수 — Mon, 29 Jun 2026 05:32:02 +0000

41 out of 73 captures never left my inbox. Not because my retrieval system failed — because they never got processed in the first place.

I spent three weeks logging every note I captured across 21 days and tracing exactly where each one stopped moving. Every PKM framework I'd read pointed at the same culprit: retrieval. Bad tagging, weak search, no backlinks. So I'd built for that — custom Obsidian templates, a graph view, a daily note that auto-pulls open tasks. None of it mattered, because the bottleneck was upstream. The notes were dying at capture, not at search.

The finding that actually changed my setup: notes I captured with a single sentence of intent — why this matters, what problem it connects to — had roughly an 80% retrieval rate when I went back for them two weeks later. Notes without that sentence: around 20%. That's not a tagging problem or a folder problem. It's a 45-second problem at the moment of capture. I'd been optimizing the wrong end of the pipeline entirely.

The fix wasn't a better app. I swapped a frictionless Raycast snippet (fast capture, zero context) for a two-field Notion form with a mandatory 15-word minimum on the second field. Slower by 30 seconds. The survival rate difference was not subtle.

The other thing I hadn't measured: I tracked how often I actually ran the "daily review" my system was designed around. Nine out of 21 days. On the other twelve, the inbox just grew — and once it crossed roughly 25 items, I'd start skimming by title instead of reading. The most fragmentary captures (usually the most valuable) had the worst titles and kept getting skipped.

There's a third failure point I found that took me longest to see — notes that did clear capture and processing, then landed in folders that were functionally prettier inboxes. I had 340 of them with a last-reviewed date older than 30 days and zero outbound links.

I wrote up the full breakdown — including the four-label routing system I replaced my folder hierarchy with, and what "routing to a workflow" actually means in practice — over on dailyfocusmag.com.

Full post →

My Anthropic bill dropped from $312 to $156 after I added two bash hooks to Claude Code

강해수 — Mon, 29 Jun 2026 01:11:57 +0000

60% of a $312 Anthropic bill came from a single pattern: Claude Code hitting a D1 migration failure, then spinning up 7–8 retry Bash calls trying to diagnose what went wrong. Each loop burned 40–60K tokens. Three or four loops per session, and you're looking at $0.50–$0.70 just evaporating.

The fix wasn't prompt engineering. It was a PostToolUse hook that fires the moment wrangler d1 migrations apply exits non-zero — before the agent has a chance to start its retry spiral.

#!/bin/bash
# post_bash_hook.sh
COMMAND="$1"
EXIT_CODE="$2"

if echo "$COMMAND" | grep -q "wrangler d1 migrations apply"; then
  if [ "$EXIT_CODE" != "0" ]; then
    echo "ALERT: D1 migration failed (exit $EXIT_CODE). Check schema state." >&2
    curl -s -X PUT "https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/storage/kv/namespaces/$KV_NS/values/migration_failed" \
      -H "Authorization: Bearer $CF_API_TOKEN" \
      -d "1" > /dev/null
  fi
fi

exit 0

A Slack bot polls that KV key every 3 minutes. When it flips to 1, I get pinged and can intervene before Claude Code decides to investigate further on my token budget. Six months running this setup: zero schema-mismatch incidents, and the next month's bill came in at $156.

The other half of the chain is a PreToolUse hook that blocks wrangler deploy whenever the agent is on main — learned that one the hard way after a production deploy went out from the wrong branch and left two Workers in a broken state for five minutes. The thing most people miss: when your hook returns exit 2, Claude Code reads whatever you wrote to stderr as context. A vague BLOCK does nothing useful. BLOCK: wrangler deploy on main — use staging namespace instead actually redirects the agent correctly.

There's also a pre-commit hook at the end of the chain that scans staged diffs for hardcoded production binding names and secret key patterns — a last filter before anything reaches git history.

I wrote up the full breakdown — including the exact .claude/settings.json structure, how hook matcher patterns work (and where they don't), and the FAQ on execution order guarantees — over on riversealab.com.

Full post →

800 simultaneous Workers, one cache miss, $40/mo surprise — the Cloudflare coalescing fix

강해수 — Mon, 29 Jun 2026 01:08:26 +0000

A Korean flash sale at 9PM cost me $40 in a single month — not from KV reads, but from 800 concurrent Workers all racing to fetch the same product JSON from my origin in a 120ms window.

The naive KV pattern looks fine on paper: check KV, miss, fetch origin, write KV, return. At low concurrency it works. Under burst traffic, though, KV write latency was running ~80ms and my origin fetch took ~120ms. That gap was wide enough for roughly 400 Workers to independently conclude the cache was cold and hammer the origin simultaneously. It started returning 429s by 9:03PM.

The fix isn't faster KV writes — it's ensuring only one Worker ever starts the fetch. Everything else waits on the same in-flight promise. That's request coalescing.

The trick is that Workers are stateless across isolates — you can't share a Promise between them directly. But a Durable Object runs in a single-threaded JS environment, which makes a Map of in-flight promises inside a DO completely race-condition-free. The structure is straightforward:

if (!this.inflight.has(key)) {
  const promise = this.fetchAndCache(key).finally(() => {
    this.inflight.delete(key);
  });
  this.inflight.set(key, promise);
}
const body = await this.inflight.get(key)!;

The finally delete is non-negotiable. A catch-only cleanup means a failed fetch permanently poisons that map entry — every future request for that key gets a rejected promise with no recovery. One other gotcha I hit early: I initially keyed all requests to a single DO instance with idFromName("global-coalescer"). A Durable Object processes requests serially, so one slow origin fetch for product A would block product B entirely. The right move is idFromName(key) — one DO instance per resource, not one global bottleneck.

The Worker entry point stays simple: KV hit returns immediately (that hot path costs nothing up to 10M reads/day), and only on a miss does the request route to the Durable Object coalescer.

I wrote up the full breakdown — including the wrangler.toml config, the KV write timing problem inside a DO, and the exact production numbers before and after — over on dailymanuallab.com.

Full post →

73% of my context switches came from one mistake: sharing a workspace across two businesses

강해수 — Mon, 29 Jun 2026 01:06:38 +0000

73% of my context switches last year traced back to a single root cause — not bad habits, not weak discipline, but a Notion dashboard that mixed ad ops client work with content publishing in the same view.

I run two genuinely different businesses. One is reactive: client Slack pings, campaign pacing, Meta and Google platform alerts. The other is slow and accumulative: SEO content, internal tooling, long-horizon experiments. For months I treated them as one organism inside one workspace. I even built a unified "Today" dashboard I was proud of — filtered views from both businesses, a single command center. What it actually did was guarantee the most urgent ad ops ticket was always sitting next to the most important content task. Urgency won every time. I tracked interruptions for six weeks to confirm I wasn't imagining it. I wasn't.

The fix that actually moved the needle wasn't a new tool — it was hard namespace separation. Two top-level Notion page trees with zero cross-links between them. No shared databases, no cross-references. Ad ops contractors can't see the content root; content collaborators can't see ad ops. Sounds obvious. I ignored it for four months because the shared setup felt "efficient." What it really did was create accidental visibility — a contractor would spot something from the content side, ask about it, and that was 20 minutes gone before I'd even started the task I opened Notion for.

The second shift was moving all drafting out of Notion entirely into Obsidian. Notion is good at structured status tracking. It is genuinely bad for thinking — the database UI creates a false sense of progress. A row exists, therefore work is happening. Moving first drafts to Obsidian and only syncing final status back to Notion cut my half-finished project count from 22 down to 8 in two months. That number surprised me more than anything else in 18 months of running this.

I wrote up the full breakdown — including the specific Notion properties I use in each business, how I handle the two categories that legitimately touch both sides (invoicing and contractor comms), and the CRM decision that goes against the standard "one master CRM" advice — over on dailyfocusmag.com.

Full post →