Where AI Helps and Hurts Writing: Developer Content in the LLM Era

#meta #blogging #webdev

I have spent the past year writing developer content with AI tools sitting next to me the entire time. Not watching from a distance — actively involved in research, in drafting, in fact-checking. After roughly eighty articles and a few hundred thousand words produced this way, the pattern is clear: AI helps in ways that are concrete and repeatable, and it hurts in ways that are subtle and cumulative. This article is my attempt to map that territory honestly.

I am not arguing for or against AI in writing. The question is not whether to use it — the question is where. If you put AI in the parts of writing where judgment matters most, you degrade the result. If you put it in the parts where raw throughput matters most, you free yourself to do better work. The skill is knowing which parts are which.

The Parts AI Is Genuinely Good At

There are tasks in the writing process where AI is not just faster than a human — it is better. Not more creative, not more insightful, but more thorough. These tasks share a common property: they involve processing large volumes of information and identifying patterns, and the cost of missing something is higher than the cost of a false positive.

Research aggregation. When I test a tool, I accumulate notes, screenshots, GitHub issues, documentation pages, competitor pages, and community discussions across four or five platforms. Manually organizing this into a coherent map of what matters takes hours. Claude does it in seconds, and it notices things I miss — a pricing clause buried in a changelog, a GitHub issue from six months ago that exactly describes the bug I encountered, a Reddit thread where three different users report the same undocumented limitation. The AI is not smarter than me. It has more working memory. For research synthesis, that is the relevant advantage.

First-draft scaffolding. There is a specific moment in writing where progress stalls. You know the structure, you know the evidence, but translating an outline into prose feels like pushing a boulder uphill. AI turns this into a five-minute operation: feed it the outline, the research notes, and a style guide, and it produces a draft that is mediocre but complete. The draft is not the article. The draft is a carpet you lay down so you have something to walk on. I rewrite roughly seventy percent of every AI-generated first draft, but the thirty percent I keep — transitions, structural framing, data paragraphs — saves me the two hardest hours of writing every time.

Factual consistency checking. The most useful thing AI does in my workflow is also the least visible to readers: I feed it the near-final draft and my original testing notes and ask it to flag every claim it cannot verify. This catches pricing errors where I misremembered a tier, feature claims that drifted in editing, version numbers I typed wrong. The AI does not fix these. It flags them. I verify and correct each one. This is mechanical work that a human editor should do but rarely has time to do thoroughly. Automating it means every article gets this check instead of the occasional one.

Grammar and style mechanics. Subject-verb agreement, inconsistent capitalization, run-on sentences, repeated phrase starts — these are not writing problems. They are typing problems. AI handles them faster than a human copy editor and, for the narrow band of mechanical issues, more reliably. Using AI for this lets me spend my editing time on sentence rhythm, argument clarity, and whether the conclusion actually follows from the evidence.

The pattern across all four uses is the same: AI does the work where completeness and consistency matter more than judgment. It catches everything so you can decide what matters. Use it for aggregation, not articulation. You decide what the article says. The AI just makes sure you have not accidentally said something false along the way.

The Parts AI Cannot Fake

For every task where AI is genuinely useful, there is a corresponding task where it is actively harmful — not because it produces errors, but because it produces text that feels off in ways readers detect immediately, even if they cannot name why. I have learned to recognize these failure modes quickly, but most writers using AI for the first time do not see them until a reader points them out.

Genuine insight. AI can summarize the conventional wisdom about any topic. What it cannot do is notice that the conventional wisdom is wrong, or incomplete, or applies differently in edge cases that only emerge from experience. The most valuable sentence in any article I write is not the one that restates what everyone knows. It is the one where I say "the documentation claims this works, but here is what happened when I tried it on a Tuesday with real data." AI cannot write that sentence because AI did not try it. The sentence depends on having been there.

Personal experience. AI can mimic the form of a personal anecdote — "when I first started using Kubernetes, I found the learning curve steep" — but it cannot produce an anecdote that contains specific, verifiable detail that did not exist on the internet before the writing session. Readers can tell the difference. A real anecdote contains friction. It mentions the exact error message, the time of day, the thing you were trying to do when the tool broke. A synthetic anecdote is smooth and frictionless because it describes no actual event. This is why AI-written "personal stories" read like LinkedIn posts: plausible but hollow.

Nuanced tradeoffs. Ask an AI to compare two tools and it will produce a balanced assessment where both tools are "excellent choices depending on your needs." Ask a human who has used both tools and they will say something like "Option B has better documentation but the query builder is so sluggish at scale that I cannot recommend it unless your team is under five people." The difference is not information. The AI has the information — it had both sets of documentation. The difference is discrimination. The human knows which weaknesses matter in practice and which are theoretically interesting but irrelevant. The AI treats all features and all flaws as equally weighted. Real writing requires saying "this problem matters and that one does not," and AI is structurally incapable of making that call.

Authentic voice. This is the hardest one to describe but the easiest one to feel as a reader. AI prose has a texture. It is aggressive in its optimism — tools are "seamless," integrations are "robust," experiences are "empowering." It hedges aggressively — "while no solution is perfect, this tool offers a compelling value proposition for teams seeking to..." It structures every paragraph as claim-evidence-implication regardless of whether the material calls for that structure. None of this is wrong. It is just not how humans write, and after you have read enough AI-generated text, you develop the same instinct for it that people develop for spotting photoshopped images. Something is off. You cannot point to it. You just know.

The risk is not that AI will write something factually incorrect — you can catch that. The risk is that it will write something technically correct that no human would ever choose to say. The sentence is grammatically flawless, factually accurate, and reads like the output of a machine that has read a thousand articles about this topic and understood none of them. That is the uncanny valley of AI writing. You cannot edit your way out of it. You have to throw the sentence away and write it yourself, from scratch, in your own voice, using your own reasons.

The Before and After

Here is what this looks like in practice. The following is a paragraph from a draft of a comparison article I was working on. The first version is what Claude produced from my outline and research notes. The second is what I published after rewriting it.

AI draft:

Both platforms offer compelling solutions for teams seeking to streamline their deployment workflows. While Platform A provides a more robust feature set with comprehensive CI/CD integration and advanced monitoring capabilities, Platform B excels in ease of use with its intuitive interface and simplified configuration process. Ultimately, the choice depends on your team's specific requirements and existing infrastructure stack, making both platforms excellent options for modern development teams.

Published version:

Platform A ships more features. Platform B ships fewer features that actually work as documented. After two weeks with each, here is what I mean by that. Platform A's CI/CD integration supports fifteen providers on paper. I tested five of them. Two worked reliably. Two had authentication errors that the documentation did not mention, and one deployed to the wrong environment with no warning. Platform B supports four providers and all four worked on the first try. If your team values breadth of marketing claims, pick A. If you value deployment pipelines that do not wake you up at 3 AM, pick B.

The AI draft contained no errors. Every claim it made was technically defensible. It was also worthless — it gave the reader no reason to care about either tool and no basis for choosing between them. The published version takes a position, explains why, and gives the reader something the AI draft could not produce: the experience of someone who actually used both products.

This is not about writing skill. The AI draft was better-written in a technical sense — more balanced, more diplomatic, more polished. The published version was better reporting, and reporting is what readers come for. No amount of prompt engineering will make an AI tell you which tool's CI/CD integration breaks silently, because the AI did not configure the integration and watch it fail. You cannot prompt your way past the absence of lived experience.

What This Means for the Web

The economics of AI content creation create an incentive structure that is straightforward and corrosive. An AI-generated article costs roughly two cents to produce. A human-written article built on real testing costs between fifty and five hundred dollars. Any publishing model optimized for volume and SEO will saturate search results with the two-cent version.

This is already happening. Search for "best developer tools" in 2026 and the top results are articles written by people who have never used the tools they recommend. The content is not wrong in a way you can easily debunk — no single sentence is false. But the aggregate effect is a kind of information erosion. Each new AI-generated article draws on the previous generation of AI-generated articles, and each generation drifts slightly further from anything grounded in actual use. The recommendations converge on consensus without ever touching reality.

I do not think this trend reverses. The cost advantage is too large, and Google's ability to distinguish synthetic from experiential content is too limited. What changes instead is reader behavior. Developers who have been burned by a recommendation that turned out to be synthetic will start looking for signals of authenticity — a named author with a GitHub profile, specific error messages in the review text, recommendations that are awkwardly specific rather than smoothly generic. The publications that survive the LLM flood will be the ones that make those signals unambiguous and prominent.

For Pickuma, that means the AI assistance is visible and the human judgment is unmistakable. I use AI to do the parts of writing that are about throughput. I do the parts that are about judgment myself. And I try to write in a way that makes the distinction obvious — not by declaring it in a disclosure badge, but by producing sentences that an AI could not have written because an AI did not do the thing the sentence describes.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.