DEV Community: Daniel Nwaneri

$30 and a Lifetime of Liability

Daniel Nwaneri — Thu, 02 Jul 2026 11:57:19 +0000

co-written with UnitBuilds, who built most of this out loud in the comments of my last piece.

I recently wrote about the $30. someone in cambodia or kenya, paid under $30 to complete a biometric verification step on behalf of a stranger, so a developer somewhere could access an ai model that's geo-blocked where they live.

I framed it as exploitation. it is. but I stopped at the harvesting.

UnitBuilds didn't stop there. over a series of comments, he walked through what happens after the $30 — and it's worse than anything I'd written.

the part the verification step doesn't tell you

when you complete a biometric check — face the camera, look left, look right — you're not just proving you're human. legally, you're authorizing.

not authorizing this one transaction. authorizing the account. anything done with it, by anyone, from that point forward, is yours. that's not a loophole. that's the definition of authentication.

as UnitBuilds put it:

"they can contest in court, but they won't win, by law they can't win, because the very definition of the authentication is that you, as yourself, fully authorize yourself and anyone else by proxy, to use your account to do with, for whatever purposes, assuming full responsibility for it."

the person who took the $30 didn't sign up to be liable for whatever happens next. but the law doesn't have a category for "deceived into authorizing." it has a category for "authorized." and once you're in that category, you're not fighting the bill. you're fighting jailtime.

what "fighting jailtime" actually looks like

UnitBuilds laid out the scenarios plainly:

a bad actor uses the harvested identity to rack up charges, commit fraud, or worse. the account holder — the person who took the $30 — has no idea any of this happened. months later, maybe years later, they get a job offer overseas. they travel. at the border, there's a warrant. for a crime committed using their face, on the other side of the planet, by someone they've never met.

or the company affected sues. the debt is structured for someone earning a developer's salary in a wealthy country. the person actually liable is earning $100 a month.

"imagine that, an entire month's pay gone, on a single ai subscription they never even knew existed, from a bank account they never made. and they don't have the finances to actually fight it in court."

that's UnitBuilds describing namibia specifically — people working full contracts, 8 to 5, for $100 a month. not informal work. not gig work. contracted employment. wiped out by a bill that was never theirs, with no path to contest it, because contesting it costs more than the bill itself.

the version where you don't even get the $30

the scenario above assumes someone got paid. UnitBuilds described a worse one: phishing.

a fake overseas job offer. "all you have to do is submit your id and do the facial verification, and send the code that's sms'd to you." it looks exactly like a routine hiring process. and then:

"that's the last you ever hear of them."

no payment. no awareness that you were ever part of a supply chain. just a verification step that felt normal, and a liability that surfaces however long it takes for someone to misuse it.

this isn't new, it's just wearing new clothes

UnitBuilds has watched this pattern before ai existed. bank impersonation calls — spoofed numbers, confident voices, "confirm your account details" — targeting pensioners who grew up trusting that a call from the bank was actually the bank.

"life-savings gone from pensioners, who have no means of earning it back or fighting the bank for it. some had to choose between food on the table and paying their wifi, losing access to communication with everyone they know, for the sake of not going hungry, because someone scammed them out of 50 years worth of hard work."

whatsapp cloning works the same way — impersonate a relative, get the verification code, clone the account, spread it to the entire contact list, harvest more identities, repeat.

the throughline, in his words:

"it's a system built on accountability, not morality, and the legal system is there to defend the dollar not the person."

in namibia, you go to prison longer for poaching a cow than for murder.

the part that has nothing to do with biometrics

then UnitBuilds introduced something I hadn't considered at all: hardware identity theft.

two forms. the first is shadow proxy networks — malware that quietly routes traffic through your residential gateway, so someone else's activity travels under your ip, your network, your name.

the second is newer and stranger. you buy a windows 11 laptop. secure boot signs the hardware to your microsoft account the moment you log in. from that point, you're the authorized owner of that device — and liable for whatever it does — until you go through the process of manually removing it from your account's device list. format it, sell it, give it away: none of that breaks the link. the new owner is using hardware that's still, in microsoft's records, yours.

"a small little detail they don't tell you when they say it's 'for your data security.'"

the mechanism is identical to the biometric one. ownership and liability bound to an identity that doesn't update when the physical reality changes. the gap between who actually controls something and who's legally responsible for it is where all of this lives — bodies, devices, accounts, doesn't matter. the structure repeats.

the sentence underneath all of it

a developer going by self-correcting systems read the original piece and named the pattern precisely:

"a control that can't see its own downstream doesn't stop the harm, it relocates it."

that's what every layer of this is. kyc doesn't stop fraud — it relocates the verification burden onto someone with no stake in the outcome. secure boot doesn't stop hardware theft — it relocates ownership liability onto whoever's account it happened to be signed into. every fix moves the cost. none of them eliminate it. they just choose, by design or by accident, who absorbs it.

the people who absorb it are consistently the people least equipped to refuse, least equipped to understand what they're agreeing to, and least equipped to fight it once it lands.

UnitBuilds runs Halo Cybersecurity adjacent work and built NMCP, a rust-based mcp implementation. everything quoted here, he gave permission to use directly — his words, not mine, paraphrased into something smaller than what he actually said.

most of what's true in this piece, he wrote first, out loud, in a comment thread.

AI helped me research, structure, and edit this piece. The arguments, the examples, and the opinions are mine and UnitBuilds'. So is whatever's wrong with them.

Someone Else Pays for Your AI Access

Daniel Nwaneri — Tue, 30 Jun 2026 07:22:16 +0000

you probably didn't think about this when you signed up.

you entered your card details, verified your phone number, maybe uploaded a government ID and took a selfie. friction. annoying. you moved on.

somewhere in cambodia or kenya, someone did the same thing. except they weren't signing up for claude. they were being paid — under $30 — to complete a verification step on behalf of someone they'll never meet, for a service they'll never use, in a supply chain they don't fully understand.

their face is now in a database they didn't choose. it will be used again. not for claude.

every time anthropic tightens access to protect its models, the evasion doesn't stop. it migrates.

geoblocking produced vpn services. phone verification produced sms farms. credit card requirements produced stolen card networks. biometric kyc — live selfies, government id matching — produced agents traveling to lower-income countries to recruit real people willing to complete in-person verification for cash.

the controls and the evasions are a paired system. you can't have one without the other. and the cost of the evasion doesn't stay where the models are. it moves to wherever people are poor enough to trade their biometric data for $30.

the fable shutdown made this visible in a new way.

on june 12, 2026, anthropic disabled fable 5 and mythos 5 for every customer worldwide — not because of an outage, not because of a flaw they found, but because the us government issued an export control directive at 5:21pm Anthropic's official statement and there was no way to segment foreign nationals from us persons in real time. so they turned it off for everyone.

gabriel attal compared it to iran blockading the strait of hormuz AI Frontiers Media. brussels talked. developers in san francisco talked about reliability. nobody talked about cambodia.

the transfer station economy — documented in may 2026 by oxford researcher zilan qian ChinaTalk, May 5 2026 — has been running this supply chain in public for years. github. taobao. telegram. chinese developers accessing claude at 10% of official price through api proxies that sit between them and anthropic's infrastructure.

the three ways the price gets that low:

first, account arbitrage — bulk-registered free credits, unused quotas, carved-up max plans.

second, model swapping — you pay for opus, you get haiku, sometimes you get glm. you can't verify which model answered you.

third, the logs. every prompt, every response, every tool call, every reasoning trace sitting on a proxy operator's server. for a developer using claude code, that's your repository context, your engineering decisions, your verified correct outputs. the markup business is customer acquisition. the logs are the margin.

but the third meal isn't just data extraction. the upstream supply chain that keeps the proxy pool running needs verified accounts. verified accounts need identities. identities increasingly need biometrics. and biometrics, when ai deepfakes get good enough to detect, need real humans.

so agents go to cambodia. agents go to kenya. they find people willing to complete verification for under $30. those faces enter a database. that database doesn't stay in the claude access supply chain.

the chinese developer paying 10% for tokens didn't order this. they're trying to build something with the same tools everyone else has, priced out by geography the same way a developer in lagos is priced out by latency and infrastructure. neither of them sees the person whose face just got harvested in cambodia. neither of them chose the system that makes that harvesting profitable. they're both downstream of a fight they didn't start, between parties who will never absorb the cost themselves.

the worldcoin black market documented this pattern before anyone was paying attention. iris scans harvested in cambodia and kenya, sold for under $30. the same infrastructure. the same geography. the same people absorbing costs they didn't choose.

this isn't new. content moderators in kenya process trauma for platforms they'll never use. data labelers in colombia annotate images for models trained in san francisco. the biometric harvesting is the same supply chain, one layer deeper.

a face verified to bypass anthropic's kyc today can be resold to open a fraudulent bank account tomorrow. it can generate a deepfake. it can be used for blackmail. the original subject in the global south bears the legal and reputational consequences of a transaction that had nothing to do with them.

i build in port harcourt. every api call i make crosses an ocean and costs latency i can't engineer away. i wrote about that recently — the physics problem nobody warned you about.

this is the other side of that piece.

the infrastructure gap isn't just latency. it's who absorbs the externalities of the access war. when two parties fight over who gets to use a model, a third party — somewhere with weaker institutions, fewer legal protections, and more financial pressure to say yes to $30 — pays the cost neither of the original parties wanted to carry.

that's not a side effect.

the controls will keep tightening. fable has been offline for seventeen days. mythos was partially restored on june 27 — only for critical infrastructure organizations the us government specifically approved. general users, developers, international subscribers are still waiting. gpt-5.6 is next in line for the same review process. each new restriction produces a new evasion layer, and each evasion layer reaches further down the economic ladder to find humans willing to be part of the supply chain for cash.

the people performing outrage about ai access — in brussels, in san francisco, in policy papers — are arguing about the front of the supply chain. nobody is arguing about the back.

someone else is paying for your claude access. you won't read about them in the policy papers.

AI helped me research, structure, and edit this piece. The arguments, the examples, and the opinions are mine. So is whatever's wrong with them.

What Actually Happens When You Call an LLM API

Daniel Nwaneri — Mon, 29 Jun 2026 08:07:46 +0000

you've felt it.

you type a prompt, hit send, and the response starts streaming in under a second. smooth. instant. you feel like you're thinking out loud with a machine.

then the next day — same model, same prompt — you wait. three seconds. five. the cursor blinks. nothing. then it all comes at once.

you probably blamed your wifi.

it wasn't your wifi.

what actually happened in those extra seconds is a story that starts in a building you'll never visit, runs through a cable at the bottom of an ocean, and ends on a gpu that was busy doing someone else's thinking before it got to yours.

and if you're building in africa or anywhere that isn't virginia, ireland, or frankfurt — that story has a chapter in it specifically about you.

the journey of one api call

let's follow a single request from the moment you hit send.

your prompt leaves your device and travels as packets of data through your ISP, hits a submarine fibre cable, crosses an ocean, arrives at a data centre, gets routed to the right server, waits for a gpu to become available, gets processed, and the response travels back the same way.

that whole round trip happens in what feels like nothing.

except it isn't nothing. every step costs time. and some of those steps cost more depending on where you're sitting on the planet.

what is a data centre, actually

before we get to the interesting parts, ground this.

a data centre is a building — sometimes the size of several football pitches — filled with servers. those servers are computers without screens. stacked in metal racks. thousands of them. running twenty-four hours a day, seven days a week, never switching off.

every api call you make, every message you send on whatsapp, every google search, every youtube video — all of it is touching a server in a building like this somewhere.

the building needs three things to function: power, cooling, and connectivity. the power runs the servers. the cooling stops them melting — servers generate enormous heat at this density. the connectivity is the fibre cable that connects the building to the rest of the internet.

nigeria has 17 of these buildings. the united states has over 5,500.

that gap matters. we'll come back to it.

latency: the physics problem nobody warned you about

latency is the time it takes for data to travel from point A to point B and back.

it is bounded by physics. data moves through fibre optic cable at roughly two-thirds the speed of light. you cannot make it faster. you can only make the distance shorter.

lagos to london is approximately 5,000 kilometres. at two-thirds the speed of light, the minimum possible round-trip time is around 50 milliseconds just from the distance alone. add routing, congestion, processing and you're looking at 100 to 150ms before your request has even reached the server.

then the model has to think.

then the response travels back.

most developers building in nigeria are hitting llm servers in us-east-1 (virginia) or eu-west (ireland or frankfurt). that's not a complaint — those are where the servers are. but it means every api call carries 100 to 200ms of latency just from geography, before inference even begins.

for a streaming chatbot, you feel this. that pause before the first token appears isn't the model being slow. it's the speed of light, applied to distance.

inference: what the gpu is actually doing

when your prompt arrives at the server, it doesn't get processed the way you might imagine — like a search engine matching keywords.

the model runs your prompt through billions of mathematical operations, layer by layer, to predict what the most likely next token should be. then the next. then the next. each token generated one at a time, sequentially, until the response is complete.

this is inference.

a token is roughly three-quarters of a word. "hello" is one token. "infrastructure" is two. the response you're reading right now would be several hundred tokens.

why does this matter? because every token costs compute. a longer prompt costs more compute on the input side. a longer response costs more on the output side. and all of that compute is happening on a gpu inside a data centre consuming real electricity.

the gpu: why this specific hardware

your laptop has a cpu — central processing unit. it's designed for general tasks: running your browser, compiling your code, handling your operating system. very fast at one thing at a time.

a gpu — graphics processing unit — was originally designed to render video games. thousands of smaller cores that can do many calculations simultaneously. it turns out this parallel architecture is exactly what llm inference needs: running the same mathematical operations across billions of parameters at once.

a single high-end gpu used for llm inference — an nvidia h100 — costs around $30,000. a data centre running a frontier model has thousands of them.

when you call an llm api, your request is routed to one of these gpus. if that gpu is busy processing another user's request, yours waits. that wait is real. it shows up as latency on your end.

this is what rate limits are actually enforcing: the physical capacity of the hardware.

cold starts: why the first request is slower

you've noticed that sometimes the very first call in a while takes noticeably longer.

this isn't imaginary. it's a cold start.

models are large. a frontier model can be hundreds of gigabytes of weights — the numbers that encode what the model knows. those weights need to be loaded into gpu memory before inference can happen. if no request has come in for a while, the system may have partially unloaded the model to free up memory for other things.

the first request has to wait for the model to load back in. subsequent requests hit the already-warm model and feel faster.

serverless llm deployments are especially prone to this. you pay less when traffic is low. but your users feel the first request after a quiet period.

why nigeria specifically

nigeria's 17 data centres — 14 of them in lagos — run almost entirely on diesel generators. the national grid provides on average four hours of power per day. every data centre makes up the difference with generators burning diesel around the clock.

this is expensive. it's also why local cloud infrastructure hasn't scaled the way it has in markets with stable power.

the consequence for you as a developer: every llm api call you make routes to a server that is not in nigeria. not in west africa. often not even on the continent. you are paying the latency cost of that distance on every single request, for every single user you have.

this isn't a software problem. it's a geography and infrastructure problem. and it has a direct effect on how your ai-powered products feel to the people using them.

what this means when you're building

three practical things:

stream the response. don't wait for the full response before showing anything. streaming tokens as they arrive makes the experience feel faster even when it isn't. the perceived latency drops dramatically because the user sees something happening.

cache aggressively. if you're calling the same prompt or near-identical prompts repeatedly, cache the response. inference is expensive. latency is expensive. caching eliminates both for repeated queries.

pick the right model for the job. a 70 billion parameter model is slower and more expensive than a 7 billion parameter model. for many tasks — classification, extraction, short-form generation — the smaller model is sufficient and returns results significantly faster. frontier models are not always the right tool.

the bigger picture

data centres exist because computation has to live somewhere physical. it takes power, water, land, and connectivity to run the infrastructure that makes ai feel effortless.

africa accounts for less than 1% of global data centre capacity while housing 18% of the world's population. the gap between what the continent generates as digital demand and what it owns as infrastructure is where the latency comes from, where the dependency comes from, where the value extraction happens.

knowing it's a physics problem, not a code problem, changes where you look. knowing that equinix, aws, and microsoft own most of the continent's usable capacity changes what you think about it.

it's probably not your code. it's a building somewhere running on diesel.

AI helped me research, structure, and edit this piece. The arguments, the examples, and the opinions are mine. So is whatever's wrong with them.

Everyone's Excited About Claude Tag. Nobody's Built the Trust Layer.

Daniel Nwaneri — Wed, 24 Jun 2026 13:23:02 +0000

Andrej Karpathy, OpenAI co-founder and former Tesla AI director, called Claude Tag the third major redesign of LLM UI/UX. First the LLM was a website. Then it was an app you downloaded. Now it's a persistent, asynchronous teammate that lives inside your Slack channels with org-wide context. He's right about the architecture. He's silent on what happens to the room.

Simon Smith, who'd already wired ChatGPT Workspace Agents into his team's Slack, said ambient visibility helps adoption: people watch each other use Claude in a shared channel and learn organically, no training program required. That's true for the person who turned Claude on. It's a different experience for the person who didn't get a vote.

I wrote about this eighteen hours after the announcement. Tag Claude into a five-person team and the moment it joins, every message anyone types is something an AI reads. You stop looking like someone using a tool. You start looking like the person who brought a surveillance device into the meeting. The frame you've built from there is unwinnable: good output gets read as "she's outsourcing her thinking." Mediocre output gets read as "see, this is what we were worried about." There's no third outcome that proves the skeptics wrong.

Gail Weiner replied to that thread with something simple. Bring the skeptics into the conversation. Ask how comfortable they are. Start small, let them pick the first use case, and let the small win be something they can point to and say out loud: this added value.

That's not diplomacy. It's a trust mechanism with a hard edge. The moment a skeptic says "okay, this added value" out loud, they're no longer the person blocking the rollout. They're on record as the person who approved it. Gail named it better than I did: the human trust layer. Everything Karpathy is excited about runs on top of that layer, and nobody launching Claude Tag this week is talking about who builds it.

Days before this launch I wrote production-safe-agent-loop, a small Python library for keeping single-agent loops from running away. A four-agent LangChain loop ran eleven days and cost $47,000. Claude Code recursion has burned $16,000 to $50,000 in five hours. The fix wasn't a smarter agent. It was five primitives: a spec writer that forces three answers before the loop runs, a circuit breaker with hard ceilings, an append-only ledger, the loop that respects both, and a review surface that assembles a fixed five-element frame once the run finishes: the original promise, the acceptance criteria, the diff, the evidence, and the unresolved assumptions.

The last piece is the one that matters here: attestation. A human reviews the frame, and attestation is not approval. It's a record that they reviewed exactly what's in front of them and they're taking responsibility for what happens next. The frame gets hashed. Two reviewers attesting the same session get the same hash. That's a receipt, not a vibe.

Claude Tag has none of this. It's ambient, persistent, and it decides on its own initiative what's relevant across every channel it's in. The five-element frame I built for single-agent loops maps directly onto "what did Claude decide in this channel, and did a human actually sign off on it." Just at team scale, across a dozen channels, not one developer's terminal session.

What ships without it

VentureBeat is asking about data retention and vendor lock-in. Twitter is asking what it can do. The actual unresolved question is structural: when an ambient agent acts on its own initiative across a dozen channels, who gets the five-element frame, who has to attest to it, and what happens when nobody does.

Claude Tag isn't wrong to exist. It shipped the easy half. The architecture works. The trust layer is still unbuilt, and it's not a UX problem. It's an audit problem with a name and a shape, and I already wrote the code for what it looks like when someone takes it seriously.

More on agent governance at dannwaneri.com/ai-agents.

AI helped me research and edit this piece. The arguments, the examples, and the opinions are mine. So is whatever's wrong with them.

Something Changed After the Sloan Articles. I Can't Prove It.

Daniel Nwaneri — Wed, 24 Jun 2026 07:24:33 +0000

This is the third piece in a sequence. The first asked whether Sloan had flagged anyone else — it had. The second documented what I found out — Sloan is a person I know. This one is about what happened after.

I want to be precise about what I'm describing, because I can't prove what I think happened.

Here's what I can document.

The Sequence

I published two essays in June. Both generated real technical discussion — one had a five-exchange comment thread that became a production open-source repo. The founder of DEV.to liked one before it was flagged. Both got flagged by Sloan the same day.

I wrote about it. Two articles, 72 combined comments. The conversation prompted Francis to publish his own post opening the floor to community questions about moderation. Jess Lee and Ben Halpern both commented there — Jess saying DEV would be updating moderation guidelines soon, Ben calling it a big priority. Neither answered the specific question xulingfeng asked: how many Sloan warnings trigger account-level flagging, and do authors get notified when it happens?

While the Sloan articles were live, I published a sponsored piece on AI code review. It had full disclosure from the start. Sloan flagged it the same day anyway — the third flag on my account in one week.

Then I built something from it — Proof of Human, a reverse Turing Test submitted to the June Solstice Game Jam. In the first hour, people played it, dropped their scores in the comments, had real exchanges. Francis played it. Sylwia found a bug with gibberish inputs. The thread got 27 comments.

Then it stopped.

Not tapered. Stopped. The kind of stop where you go check the challenge submissions page and can't find your article. I messaged Jess. Her reply: "I'm confirming that I see your submission on my end :)"

I still don't know if it's in the judging pool.

The LLM Visibility Article

This one predates the Sloan series.

I published a piece on open source LLM visibility tracking before any of the flagging happened. Real tool, real finding, 0% citation score on one of my own domains. Named specific tools, specific costs, specific data from a Tom Capper webinar at SEJ. The AI disclosure was in it from the start. It was gaining traction.

Then the Sloan message arrived anyway. I deleted it.

After the two Sloan articles, I republished it. Same article, same content, same quality. It got neither the engagement it had been building the first time nor the outside traffic my work usually pulls when DEV.to doesn't surface it.

I'm not attributing that entirely to suppression. The article is more niche than my usual work and republished articles rarely perform like originals. But the gap between the first run and the second run is real. The first version was building. The second version went nowhere.

That's the before-and-after I can actually document: same article, two runs, one deleted after a moderation message, the other published after two articles about that moderation message. The variable that changed between the two runs is visible.

The Qodo Article

One day after the second Sloan article, I published a sponsored piece on AI code review — a real experiment, real bugs found, a Stripe webhook handler that Claude Code generated and Qodo reviewed. The disclosure was in the article from the start. The sponsorship was disclosed at the bottom.

Sloan flagged it the same day it published.

Francis resolved it the same day. But that's the third flag on my account, in the same week the Sloan articles were live. The article that had the clearest, most complete disclosure of any piece I've published here still got flagged — because the flag apparently doesn't read the article. It fires, then a human decides.

The flag landing on an article that had full disclosure from the start is the data point. Not the comment count — sponsored posts perform differently and that's a confounding variable. But a third flag in one week, on an article that did everything the policy asks, says something about how the flag mechanism works. It fires first. Disclosure is checked after.

What xulingfeng Documented

This isn't just me.

In the comments on my second Sloan article, xulingfeng ran a clean experiment: published identical content twice, once with AI disclosure and once without. Both versions got suppressed. Not the same — disappeared from feed, not showing in search, visible only to followers via direct link.

Francis confirmed that three Sloan warnings trigger account-level flagging. xulingfeng had three warnings. Francis unflagged the account after the thread.

Jess Lee also commented in that thread — on Francis's follow-up post — saying there are hundreds of mods with the ability to send Sloan messages. Francis had said in my thread he was essentially the only active one. Those two statements don't fully square. Which means the warning count on any given account could be coming from multiple people, and nobody's tracking it as a running total an author can see.

I had two warnings.

I don't know if two warnings triggered anything. I don't know if there's a threshold below three. I don't know if the Sloan articles themselves triggered something separate from the warning count. No one has told me. The guidelines don't say.

What I'm Not Saying

I'm not saying DEV.to is suppressing me deliberately. I'm not saying Francis made a bad call. He flagged what he thought needed flagging and said so publicly.

I'm not saying the algorithm is broken. I'm saying I can't read it, and when I try to read it from the outside, the pattern I see doesn't match the engagement signal.

27 comments and active game play in the first hour. Invisible in the challenge submissions list. Not surfaced.

An article with real first-hour traction that then flatlined in a way none of my previous work has.

That could be noise. That could be how the algorithm works now and I'm reading meaning into variance. I've been on this platform long enough to know that happens.

But I also know that xulingfeng documented the same pattern in the same week with direct evidence. And I know that Francis confirmed account-level flagging exists, has a threshold, and isn't communicated to authors when it triggers.

The Question I Actually Have

Not "was I suppressed" — I can't prove that.

Not "was the flagging fair" — I added the disclaimers, the policy is reasonable, I don't have a fight to pick there.

The question is simpler: if there's an account-level flag that affects distribution, should authors be told when it's active?

xulingfeng wasn't told. She documented the suppression herself through experimentation. xulingfeng had to run a controlled experiment to figure out why her articles disappeared. That's the answer to "should authors be told."

I might have the same flag on my account right now. I don't know. The challenge submission might be invisible to judges. I don't know.

I should be able to know.

One more instance, smaller but the same pattern: a tag moderator removed the #beginners tag from an earlier article because it was too advanced for the tag. I found out not through a notification but through a comment he left on a different article entirely — the first Sloan thread. Same pattern: distribution affected, author not told directly, discovered accidentally through a comment on a separate post.

One more data point, for context: a freeCodeCamp tutorial built on the same thinking, the same writing process, the same AI assistance in the workflow — published the same week. The freeCodeCamp editor's response: zero fixes needed. Same ideas. Same process. One platform flagged it. The other published it without a single editorial change. That's not a contradiction to resolve. It's just where we are.

This article was written with AI assistance for research and editing. All arguments, examples, and opinions are my own.

The LLM Visibility Tools Cost $79/Month. Mine is Open Source.

Daniel Nwaneri — Tue, 23 Jun 2026 12:33:11 +0000

Tom Capper at a Search Engine Journal webinar last week:

"There's no Search Console equivalent for LLMs."

Google Search Console tells you where you rank, how many people saw your result, your CTR, your average position. It tells you nothing about whether Claude mentions your domain when someone asks a question you should own.

That gap is now a product category. I checked most of them. Cheapest entry point: $39/month. The ones worth taking seriously: $79 and up.

At the same webinar, someone asked Capper directly: what are accessible ways to measure AEO/GEO visibility, since there's no equivalent of Search Console for LLMs? His answer was three manual approaches. None of them were a tool you could run.

I built mine for free. Open source. And it found something I'd missed for months on my own sites.

What the paid tools actually do

AIclicks, LLMrefs, Cairrot, Slate — the mechanics are the same across all of them. They query AI models with your target keywords, check whether your brand appears in the response, track it over time.

That's the whole product. The variance is in how many LLMs they cover, how the reports look, and whether the UI justifies the monthly fee.

None of them are doing anything technically unusual. They're calling APIs and parsing text.

So I added it to seo-agent as a standalone module: llm-visibility.

How it works

Point it at your domain and a query list — or just export from Google Search Console:

python main.py llm-visibility --domain dannwaneri.com --queries gsc-export.csv --project dannwaneri-com

It takes your top 20 queries by impressions. Sends each one to Claude Haiku. Checks whether your domain appears in the response. Writes everything to llm-visibility.md — visibility score, per-query results, gaps to address.

Cost: negligible. A 20-query run costs less than a cent.

What I found

I ran it on two of my own domains.

One scored 0%. Every query. Claude answered correctly — sometimes well — and never mentioned my site once.

The other scored 15%. Three out of twenty queries returned a mention. The other seventeen? Nothing.

Both domains have content on these exact topics. Both rank in Google for these queries. Neither is getting cited.

The same run also caught something else — a query sitting at position 9.5 with 29 impressions and 0% CTR: does twitch pay nigerians. The page was ranking. The title was answering the wrong question. That's not an LLM problem, that's a GSC problem. But llm-visibility sits on the same audit surface as gsc-insights, not separate from it. You find both in one pass.

Your Google rank tells you nothing about your LLM presence. They're different surfaces with different citation logic entirely.

The limitation I won't bury

Claude has a training data cutoff. Content published after that cutoff won't appear regardless of quality.

A score of 0% on a site less than a year old is expected — it's a data availability problem, not a content quality problem.

Run it quarterly. Watch the number move as training snapshots update.

The paid tools have this same limitation. They just don't always put it where you can see it.

What the pixel data adds

While I was building this, the same SEJ webinar surfaced pixel data from Tom Capper at STAT Search Analytics worth flagging: position 1 now sits 635 pixels down the page on desktop. On mobile, the top organic result is below the fold nearly two thirds of the time. AI Overviews consume roughly a third of above-the-fold space on informational queries. Paid and shopping units take over 60% on commercial ones. Organic gets what's left. (Source: Search Engine Journal, May 2026)

seo-agent covers both surfaces. The serp-features module hits SerpApi for each target query and maps which features are present — AI Overview, featured snippet, PAA, image pack, local pack. llm-visibility handles the other surface.

Neither module needs a paid subscription. SerpApi has a free tier: 100 searches/month, no credit card.

What's still missing

This only tests Claude. Not ChatGPT, not Perplexity, not Gemini.

The paid tools cover 6–10 models. That's a real gap.

I'm one person. Multi-model support is on the list.

If your Claude visibility score is 0%, adding Perplexity to the test won't fix the underlying problem. The content isn't strong enough to get cited anywhere. Fix that first. Then track across more models.

The tool

Everything is open source: github.com/dannwaneri/seo-agent

Full breakdown of what it found on a real site — a Nigerian creator site that went from 0.4% to 44% pass rate in one afternoon — at dannwaneri.com/seo-automation.

The full module list:

llm-visibility — LLM citation tracking
serp-features — SERP feature detection via SerpApi
gsc-insights — GSC export parser, quick wins, cannibalization
qualify-backlinks — referring domain scoring
relevance-score — internal link opportunity scoring
cluster-audit — topic clustering, orphan detection

Core audit runs in a real Chromium browser. Extracts title, meta description, H1s, canonical. Checks broken links. Resumable JSON state.

The speaker said there's no Search Console for LLMs.

There is now. And it doesn't cost $79/month.

This article was written with AI assistance for research and editing. All arguments, examples, and opinions are my own.

[Boost]

Daniel Nwaneri — Sat, 20 Jun 2026 16:06:35 +0000

June Solstice Game Jam Submission

Daniel Nwaneri

Jun 18

Proof of Human: I Built a Reverse Turing Test After Getting Flagged as AI

#devchallenge #gamechallenge #gamedev #ai

3 min read

Proof of Human: I Built a Reverse Turing Test After Getting Flagged as AI

Daniel Nwaneri — Thu, 18 Jun 2026 14:10:45 +0000

This is a submission for the June Solstice Game Jam

I got flagged by Sloan.

If you've been on DEV long enough, you know Sloan. I thought Sloan was a bot. Sloan is Francis — someone I've exchanged comments with for months, since before Richard left the platform. He posted about the flagging openly, tagged the founders, explained his reasoning. Then added: "This was hard to tell you for many reasons." He reads every flagged article himself, runs it through GPTZero, makes a call. He knew me. He flagged me anyway.

One of the flagged articles had sparked a five-exchange comment thread that became an open-source repo. The thinking was mine. The flag still landed.

That's the uncomfortable thing about the Turing Test in 2026: it doesn't measure origin. It measures surface texture. And if you write well enough, you sound like a machine.

What I Built

Play Proof of Human →

A reverse Turing Test. Five questions. You write your answers. Claude scores them 0–100 on how human they sound, tells you what gave you away, and at the end gives you an average.

The questions are the ones that actually separate humans from pattern-matchers:

Describe the last time something genuinely surprised you. Not shocked — surprised.
What's something you changed your mind about in the last year? What moved you?
What's a skill you have that you never bothered to put on your CV?
Name something you've read or watched that you think about more than you expected to.
What do you actually think about AI? Not what you're supposed to think — what you actually think.

The scoring prompt at the heart of it:

Human signals: named specifics, opinions that could get you in trouble, genuine uncertainty, things slightly off-topic but revealing.
AI signals: balanced framing, hedge words, smooth transitions, excessive completeness.

Score 60+: Passes. Below 60: Flagged.

The June solstice is the longest day — the day the sun is most itself. Unambiguous. No hedging. That's what this game is asking for. Not your best answer. Your most you answer. Turing's original question was: can a machine think? The question we're living with now is its inversion — can a human still sound like one?

Video Demo

Code

dannwaneri / proof-of-human

Proof of Human

A reverse Turing Test. Five questions. Claude scores your answers 0–100 on how human they sound and tells you what gave you away.

Play it →

How it works

You write. Claude reads. It scores on one axis: specificity that costs you something. Named people, opinions you might regret, genuine uncertainty, things slightly off-topic but revealing. Those pass. Balanced framing, hedge words, smooth transitions — those get flagged.

Score 60+: Passes. Below 60: Flagged.

Stack

Frontend: single index.html, vanilla JS, no framework, no build step
Backend: Cloudflare Worker (keeps API key server-side)
Hosting: Cloudflare Pages
Model: claude-sonnet-4-6

Deploy your own

1. Deploy the Worker

cd worker
npm install
wrangler secret put ANTHROPIC_API_KEY
wrangler deploy

2. Update the API URL in index.html

const API_URL = "https://your-worker.your-subdomain.workers.dev/score";

3. Deploy the frontend

# From the repo root
wrangler pages project create proof-of-human --production-branch main
wrangler pages

…

View on GitHub

How I Built It

Vanilla JS, no dependencies, one HTML file. Cloudflare Worker as proxy, Pages for hosting.

The Worker sits between the browser and Anthropic. It receives your prompt and response, calls the API, returns { score, verdict, reason }. The frontend never sees the API key. One call per question, nothing stored, model is claude-sonnet-4-6.

The frontend is a single index.html — progress bar, animated score fill, final breakdown screen. No build step. A judge can open DevTools and follow exactly what happens on each submit.

The scoring prompt took the most iteration. The first version was too generous — everything passed. The second was too harsh — everything got flagged. The final version keys on one thing: specificity that costs you something. An answer that names a real person, admits a real mistake, or takes a position you might regret. That's what the model now reliably catches.

The irony: I had to write like an AI to build a detector for AI writing. I kept second-guessing my own prompt phrasing, smoothing transitions, hedging. The game caught me too.

Prize Category

Best Ode to Alan Turing

Turing's question was whether a machine could fool a human. This game inverts it — can a human fool the machine? The mechanic is the Turing Test itself, running live, aimed back at the player.

The submission post has a real backstory: I got flagged as AI-generated on this platform the same week I built this. The incident is documented in two public articles with 60+ comments between them. Writing that passed human editorial review at freeCodeCamp got flagged by a detector on DEV. That's not a contradiction to resolve. That's just where we are and it's the question this game puts directly to you.

Built June 2026. Vanilla JS. One API call. No frameworks. The Sloan incident was real.

Claude Code Wrote the PR. Here's What the Code Review Actually Caught.

Daniel Nwaneri — Wed, 17 Jun 2026 16:40:06 +0000

Everyone is shipping AI-generated code right now. Most of it is going straight to main.

Quick verdict: Qodo catches production-grade bugs in AI-generated code before they ship. Claude Code generated a Stripe webhook handler that passed TypeScript, looked clean, and had six real bugs — an ack-before-processing pattern that would silently drop fulfilled orders, no replay protection, a non-atomic rate limiter, a DoS-prone body read, a timing-unsafe signature compare, and a shared rate-limit bucket for null IPs. Qodo flagged all six in 90 seconds. Two of them I hadn't planted; the review reasoned them out from how the code behaves at runtime, not what it says.

I'm not going to tell you that's always wrong. A lot of it is fine. But I've been building production systems on Cloudflare Workers for six years, and I know exactly how "fine" can turn into a 2am incident. The subtle bugs — the ones that pass a quick read, pass TypeScript, pass your linter — are the ones that hurt.

So I ran an experiment. I asked Claude Code to generate a Stripe webhook handler for a Cloudflare Worker. I did not edit it. I did not second-guess it. I opened a PR and let Qodo run a code review on it.

This is what happened.

the setup

The feature: a Cloudflare Worker that receives Stripe webhooks, validates HMAC signatures, rate-limits by IP using KV, and processes checkout.session.completed and payment_intent.payment_failed events.

That's a real thing. It's the kind of feature an AI tool generates confidently and completely. Looks clean. Passes TypeScript strict mode. The logic flow makes sense on a first read.

It also had six bugs.

Here's the repo: github.com/dannwaneri/stripe-webhook-worker

The code is 173 lines of TypeScript. Signature validation, rate limiting, event dispatch, KV writes. Nothing exotic. Exactly the kind of thing you'd ship on a deadline without a second look.

the code Claude generated

The entry point looks like this:

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const ip = request.headers.get('CF-Connecting-IP') ?? 'unknown';

    const limited = await checkRateLimit(ip, env);
    if (limited) {
      return new Response(JSON.stringify({ error: 'Rate limit exceeded' }), { status: 429 });
    }

    const rawBody = await request.text();
    const signature = request.headers.get('stripe-signature');

    if (!signature) {
      return new Response(JSON.stringify({ error: 'Missing stripe-signature header' }), { status: 400 });
    }

    const isValid = await validateSignature(rawBody, signature, env.STRIPE_WEBHOOK_SECRET);
    if (!isValid) {
      return new Response(JSON.stringify({ error: 'Invalid signature' }), { status: 401 });
    }

    // ... parse JSON, dispatch event

    ctx.waitUntil(processEvent(event, env));

    return new Response(JSON.stringify({ received: true }), { status: 200 });
  },
};

Reads cleanly. Validates the signature. Rate-limits. Returns 200. Done.

Except no.

running qodo

Installing Qodo took about five minutes. Go to the GitHub Marketplace, install the app, scope it to the repo. Then connect at app.qodo.ai. Once that's linked, you trigger a review by commenting on the PR:

/agentic_review

Ninety seconds later, Qodo's response appeared in the PR thread. Six bugs. Zero rule violations.

what it found

Finding 1: ack before processing (action required — reliability)

This was the top-priority finding, and it's the one that would have caused a real incident.

The code calls ctx.waitUntil(processEvent(event, env)) and immediately returns 200 OK. Stripe sees the 2xx and stops retrying. But processEvent runs in the background — if it fails (KV timeout, unhandled exception, runtime termination), Stripe never knows. The order goes unfulfilled. No alert fires. The customer waits.

Qodo's fix: either await processEvent(event, env) and return non-2xx on failure so Stripe retries, or persist the event to a durable queue before returning 200, then process with retries separately.

I knew there was no try/catch in processEvent. I hadn't framed it as an acknowledgment problem — that framing is sharper and explains the real-world failure mode directly.

Finding 2: no replay protection (action required — security)

The validateSignature function parses Stripe's t=timestamp from the header and uses it to reconstruct the signed payload. That's correct. What it doesn't do is check whether the timestamp is recent.

Stripe's own documentation says to reject any webhook where the timestamp is more than five minutes old. Without that check, a valid captured webhook can be replayed indefinitely. Same valid signature, same event ID, processed again.

The fix is four lines:

const age = Math.floor(Date.now() / 1000) - Number(timestamp);
if (age > 300) return false; // reject events older than 5 minutes

Finding 3: non-atomic rate limiting (remediation recommended — reliability)

The rate limiter reads the current count from KV, checks if it's under the limit, then writes the incremented count back. Two concurrent requests both read count = 0, both pass the check, both write count = 1. Under any real burst the rate limiter is trivially bypassed.

The correct implementation uses Durable Objects for atomic counters, or pushes the rate logic to Cloudflare's native rate limiting API.

Finding 4: body read before header check (remediation recommended — security)

The code reads rawBody = await request.text() before checking whether the stripe-signature header even exists. That means any request without a signature — a scanner, a bot, a misconfigured service — forces the worker to consume and buffer the full request body before being rejected.

For most requests that's noise. For a large payload flood it's a real DoS surface. The fix is to check the header first.

Finding 5: timing-unsafe signature compare (advisory — security)

The computed HMAC is compared to the expected hash with ===. JavaScript string comparison short-circuits on the first mismatched character, which leaks timing information an attacker can use to recover the expected hash byte-by-byte.

The fix is crypto.subtle.timingSafeEqual on the raw byte arrays before hex-encoding:

const computedBytes = new Uint8Array(mac);
const expectedBytes = hexToBytes(expectedHash);
return computedBytes.length === expectedBytes.length &&
  crypto.subtle.timingSafeEqual(computedBytes, expectedBytes);

This is a real CVE class. Qodo ranked it advisory.

Finding 6: shared 'unknown' IP bucket (advisory — reliability)

When CF-Connecting-IP is null, the fallback is the string 'unknown'. Every request that arrives without the header — health checks, misconfigured proxies, certain load balancer configurations — shares the same rate limit bucket. One noisy service can lock out all other headerless traffic.

I did not put this one in the code intentionally. Qodo caught it by reading the fallback on line 15, cross-referencing the rate limiter, and reasoning about what happens at runtime with a null header. That's not a pattern matcher. That's contextual analysis.

what surprised me

Two things.

The prioritization surprised me more than the findings did. Timing-unsafe comparison — the CVE-class security bug — ranked advisory. The architectural reliability issue ranked first. That's a judgment call, not a checklist. Qodo's reasoning: if the timing attack succeeds, an attacker can forge requests. But if the ack-before-processing architecture silently drops fulfilled orders, that's production-down-right-now. I don't entirely agree with the weighting, but I understand the reasoning and it's defensible.

Two findings weren't bugs I planted. Finding 4 (body before header) and Finding 6 ('unknown' IP bucket) — both required the review to understand what the code does, not just what it says. The 'unknown' bucket catch in particular required multi-line reasoning — fallback value on line 15, rate limiter logic in a separate function, runtime behavior with a missing header. That's what Qodo calls the Context Engine: the codebase is indexed so reviews understand architecture, not just the diff.

What it missed is worth naming. The KV namespace is reused for two semantically different key types: rl:* keys with a 60-second TTL and order:* keys with no TTL. If you ever add a TTL policy to the namespace globally, order records start expiring. Qodo didn't catch this — it would require knowing the intent of the two key types, not just observing they share a namespace. That's a fair miss. It's also exactly the kind of thing that bites you six months later when someone touches the KV config.

the generation / review distinction

Claude Code generated this. It generated it well — the code is structured, typed, readable, and handles the happy path correctly. That's what generation tools are for.

Qodo reviewed it. It found six bugs, two of them action-required, without knowing I'd planted any of them. It surfaced findings I didn't anticipate. It prioritized by real-world impact, not severity labels.

These are different jobs. Cursor and Claude are good at one. Qodo is built for the other. The reason this matters specifically for AI-generated code: AI tools write confidently. They don't flag their own assumptions. They don't know what they don't know about your production environment. The code looks reviewed because it looks clean.

Qodo is an AI code review platform. It runs as parallel agents on each PR — separate agents for critical issues, duplicated logic, breaking changes, ticket compliance, and rule enforcement, each running independently. The Context Engine indexes your codebase so it can reason about cross-file implications and architectural consistency, not just the lines in the diff. What came back on this PR wasn't a list of style nits. It was a structural critique of how the handler handles failure.

That's the gap between generating and reviewing. The PR looked fine. It wasn't.

takeaway

Run the code review. Not because you don't trust the tool that generated it. Because the tool that generated it isn't the right tool for the job.

Six bugs in 173 lines. Two of them action-required. One I hadn't thought of. That's not a failure of the generator — it's an argument for the review step.

If you're shipping AI-generated PRs without a structured review pass, you're not moving faster. You're just moving the incident to later.

The full code is at github.com/dannwaneri/stripe-webhook-worker. Qodo runs on the free tier for public repos — qodo.ai.

If you want to go deeper on AI code review, Qodo's AI Code Review Academy has a few useful reads:

What is AI code review — how it works and what it catches
Reviewing AI-generated code — common patterns and pitfalls
AI code review tools comparison — side-by-side feature breakdown

Sponsored by Qodo.

This article was written with AI assistance for research and editing. All arguments, examples, and opinions are my own.

I Got Flagged by Sloan. Sloan Is a Guy I Know.

Daniel Nwaneri — Tue, 16 Jun 2026 13:45:03 +0000

Two weeks ago I published a piece explaining exactly why AI detectors are unreliable. Then Sloan flagged me.

My argument was simple: AI detectors are probabilistic classifiers trained on distributional differences between human and AI writing. Dense, structured prose trips them constantly. The detector doesn't read. It pattern-matches statistical features.

I knew this. I wrote about it. I published it.

Then Sloan flagged two of my essays on the same day.

I published an essay arguing that AI agent loops burn money because nobody defines exit conditions before deploying. A developer left a five-exchange comment thread that built a complete production architecture on top of it. An AI podcast tool turned the unpublished draft into a full episode before it even went live.

The founder of DEV.to liked the piece.

An hour later, Sloan flagged it as AI-generated.

A second essay got flagged the same day. Same message. Same pattern.

60+ articles on DEV.to. Never flagged once. The two pieces that got flagged generated more technical discussion than anything else I've published here.

I added the disclaimers. Moved on. Then I asked, in public, if this had happened to anyone else.

It had. And the answer was more interesting than I expected.

Sloan isn't a bot running quietly in the background. Someone is sending those messages. A community member — someone I've known on the platform for months — had been reading articles and running them through GPTZero to inform his flagging decisions. Mine included.

He posted about it. Openly. Tagged the founders. Explained his reasoning. No hiding, no anonymous report — just a person who'd decided this needed doing and said so publicly.

I don't think he was wrong to care. The platform's guidelines are reasonable and disclosure matters. But it reframed everything. I'd spent days thinking Sloan was a blunt automated tool failing to read carefully. What actually happened was a thoughtful person, reading articles and running them through a third-party detector, reaching the same conclusion a blunt tool would have reached — without finishing the essay.

That's the part I keep coming back to.

Not "the algorithm is dumb." Algorithms are supposed to be blunt. That's the deal you make for scale. The harder problem is that a careful human, using a purpose-built tool, scanning specifically for AI-shaped writing on this platform, landed on the same two pieces a generic classifier would have flagged.

Short punchy paragraphs. Named data points. Rhetorical questions. Em dashes doing work. Those are also just good writing. The features that make an argument land are the same features that read as "AI-shaped" to anyone — human or model — calibrated to notice them.

Write worse, look more human. Write well, get flagged. A better classifier doesn't fix that. A more careful human doesn't fix that either, if what they're trained to notice is surface texture.

A few things happened in the thread that I didn't expect.

Someone pointed out that the policy creates a honesty penalty. If two pieces of AI-assisted writing are equally good and equally indistinguishable from human writing, the one with a disclosure gets flagged. The one without doesn't — because without the disclosure, there's nothing to catch. The system penalizes transparency, not AI use. Nobody in the thread had a clean answer to that.

Then Marco posted something that cut deeper than any of it.

He's Italian. He's been working in tech for years, struggling to communicate in a language that isn't his first. He uses AI to express ideas that are genuinely his — translating from Italian, bridging the language barrier, getting thoughts out in a form the industry can read. He'd get the same Sloan message I got. Same classifier output. Same flag. For something that has nothing to do with what the policy was designed to catch.

Three goals, everyone conflating them: stopping bot-generated content, verifying there's a human behind a text, evaluating whether ideas came from a brain or an algorithm. Those are different problems. The same Sloan message gets sent for all three.

I spent hours on those essays making sure they sounded like me. Both got flagged by someone who built a tool specifically to catch writing like mine.

The uncomfortable part isn't that I got flagged. It's that the flag was technically defensible — I did use AI assistance for research, fact-checking, and editing. What no classifier, human-built or otherwise, can know is whether the arguments are mine.

They are. The ideas came from years of building production systems, watching the same failures repeat, and writing about them. The comment threads proved it — readers extended the arguments across dozens of exchanges, a five-exchange thread turned into a working open-source repo, because the thinking was real.

You can't fake that with a prompt. You also can't detect it with one.

The same week both essays got flagged, the freeCodeCamp editorial team reviewed the tutorial built on the same thinking.

Abbey's response: "No fixes needed from you."

Same ideas. Same writing process. Same AI assistance in the workflow. One platform flagged it. The other published it with zero editorial changes.

That's not a contradiction to resolve. That's just where we are.

"AI-generated" and "human-written" used to be a useful distinction. It's becoming less useful — for tools and for people. My writing in 2026 is neither. It's a collaboration — human judgment, human experience, human argument, assisted by tools.

That collaboration doesn't have a classifier.

It has a human who can be asked: did you know what you were writing about? Do you stand behind it?

I do. Every time, including this one.

The edges are where the interesting writing lives. Turns out the edges are getting crowded.

My Bookmark Engine Returned Chunks. I Added One Endpoint to Make It Answer.

Daniel Nwaneri — Mon, 15 Jun 2026 20:40:26 +0000

Search returns things you have to read. An answer engine reads them for you.

I built a search engine on top of 50k saved tweets. Ask it something and it returns the five most relevant chunks — found through hybrid retrieval (BM25 keyword search plus vector search) and reranked by a cross-encoder. A Gemma 4 MoE layer already runs in the background too, writing its own reflections on how saved documents connect to each other. You get the chunks back, ranked. Then you read them and synthesise.

That last step bothered me. The model already synthesises when generating reflections. The retrieval already works. The only missing piece was wiring them together at query time.

So I added POST /search?mode=answer.

What it does

Same retrieval pipeline. Top 5 chunks, reranked. Then instead of returning them raw, Gemma 4 MoE reads them and produces a direct answer grounded in what you saved.

const prompt =
  `Answer the question below using only the sources provided. ` +
  `If the sources don't contain the answer, say so directly.\n\n` +
  `Question: "${query}"\n\n` +
  `Sources:\n${context}\n\n` +
  `Write a direct answer in 2–4 sentences. No preamble. No bullets.\n` +
  `Answer:`;

max_tokens: 512 returns an empty answer. Gemma 4 is a thinking model: it burns the token budget on internal reasoning before producing output. max_tokens: 2048 fixes it. The reflection engine hit the same wall. Same fix.

What came back

Three queries against the live index.

"What do people say about consistency and showing up every day?"

"Consistency wins. Some people say all the right things but never quite measure up to their words, while others do everything right without saying a word."

Two tweets, cleanly synthesised. Nothing hallucinated from outside the index.

"What do people say about money and wealth building?"

"Discussions include frustrations over the devaluation of the Naira, declining purchasing power, and debates regarding financial expertise and social media income. Some suggest only a certain regular amount is needed to sustain a lifestyle, as anything beyond that is superfluous. Regarding wealth building, it is suggested that one should provide value and be the supply in a supply and demand relationship."

This one surfaced a reflection document alongside raw tweets — a reflection-type entry the engine had already generated from CBEX-related tweets and stored back into the index. The answer pulled from both layers: raw saved content and a previously generated insight. That's the system compounding on itself.

"What are the best ways to learn programming?"

"Search Google, find a video, watch while coding, repeat."

Thin. The retrieval matched surface-level tweets: "Did you learn programming all by yourself?", "Tips on learning how to code 🧵". Not substantive content. The model answered honestly from what it got. The answer is technically grounded. It's just not useful.

The honest read

It works well on topics with substantive saved content. It returns thin answers on surface-level matches, like the programming question above. The synthesis did its job. The index just doesn't have the depth yet.

The retrieval scores on most queries are low (0.006–0.013 range). That's down to the embedding model the index was built on: bge-small, 384 dimensions, the old default, built for speed over precision. Embeddings are how the engine turns text into numbers it can compare. More dimensions means more room to capture shades of meaning. The index can't switch models without re-ingesting all 50k tweets. When I eventually migrate to qwen3-0.6b (1024 dimensions), retrieval precision improves first, and answer quality follows from that.

For now: the endpoint works. Strong on topics the index has depth on, honest about topics it doesn't.

What it isn't

The sources come back with every answer. Verify the model, check the scores, read the original chunks. And it's not search over the open web. Every answer traces back to something you chose to save. The model can't hallucinate from outside the index because the prompt gives it nothing outside the index to hallucinate from. The grounding is structural, not just instructed.

What's next

Retrieval quality is the ceiling on answer quality right now. The next piece is gap detection: a weekly pass that surfaces the three most persistent unanswered questions in the index, showing where the index has depth and where it doesn't. This endpoint makes those gaps visible in real time, one query at a time. Gap detection will map them systematically, every week.

The endpoint is live. Query it:

POST /search?mode=answer
{ "query": "your question here" }

Source chunks come back alongside the answer. The model used to synthesise it: @cf/google/gemma-4-26b-a4b-it. Same Worker, same $5/month.

The index has 50k saved tweets going back to 2016. What you get back is bounded by that. Google searches the internet. This searches what you decided was worth keeping.

Has Sloan Flagged Your Article Lately?

Daniel Nwaneri — Mon, 15 Jun 2026 10:32:43 +0000

Mine got flagged twice in one day.

Both essays generated more engagement than anything else I've published here. One had a five-exchange technical comment thread building a production architecture on top of it. The other had an AI podcast episode generated from it before it even published.

The founder of DEV.to liked one of them before the bot flagged it.

I'm not here to argue about the policy. The guidelines are reasonable. AI disclosure is fair. I added the disclaimers and moved on.

But I want to know if this is just me.

If Sloan has flagged you recently, drop a comment. Three things I'm curious about:

What kind of article was it — tutorial, opinion, essay?
Did you use AI assistance, and if so how much?
Did you think the flag was fair?

I want to know if there's a pattern or if I got unlucky twice in one day.