<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ur-grue</title>
    <description>The latest articles on DEV Community by ur-grue (@urgrue).</description>
    <link>https://dev.to/urgrue</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3958214%2Fb53e014a-da9c-4a56-9935-677277e2285a.jpeg</url>
      <title>DEV Community: ur-grue</title>
      <link>https://dev.to/urgrue</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/urgrue"/>
    <language>en</language>
    <item>
      <title>FOIA letters are a format, not a vibe — so I made Claude write them properly</title>
      <dc:creator>ur-grue</dc:creator>
      <pubDate>Fri, 29 May 2026 11:00:22 +0000</pubDate>
      <link>https://dev.to/urgrue/foia-letters-are-a-format-not-a-vibe-so-i-made-claude-write-them-properly-5gi3</link>
      <guid>https://dev.to/urgrue/foia-letters-are-a-format-not-a-vibe-so-i-made-claude-write-them-properly-5gi3</guid>
      <description>&lt;p&gt;Ask a general-purpose chatbot for a public-records request and you get something that looks like a letter and works like a liability. It misses the statute. It asks for "any and all documents" — the phrasing agencies love to reject. It promises a fee waiver you didn't qualify for. It sounds confident and gets you nothing.&lt;/p&gt;

&lt;p&gt;A records request is a format with rules. Cite the right law. Describe the records narrowly enough to be findable and broadly enough to catch what you want. State your fee-waiver basis correctly. Set the response clock. Miss any of those and you've burned weeks waiting for a "no."&lt;/p&gt;

&lt;p&gt;I maintain a free, MIT-licensed library of Claude skills for media work, and the FOIA/records skills are some of the most-used. Here's what "properly" means.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the skill actually does
&lt;/h2&gt;

&lt;p&gt;You give it: the records you want, the agency that holds them, and your jurisdiction. It gives back a filing-ready letter that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cites the correct statute&lt;/strong&gt; — US federal FOIA (5 U.S.C. § 552), a named state public-records act, or the EU/UK equivalent — not a generic "freedom of information" gesture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scopes the request&lt;/strong&gt; so it's specific (date ranges, record types, named offices) instead of the "any and all" phrasing that invites a denial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;States the fee-waiver basis&lt;/strong&gt; in the language the statute uses — public interest, news-media requester status — only when it actually applies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sets the response deadline&lt;/strong&gt; the law requires, so you know when silence becomes appealable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leaves a clean appeal trail&lt;/strong&gt; — structured so that if you're denied, your next letter writes itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this beats "write me a FOIA request"
&lt;/h2&gt;

&lt;p&gt;The difference isn't eloquence. It's that the skill encodes the procedural knowledge a beat reporter accumulates over years: which exemptions agencies hide behind, how to pre-empt the "too broad" rejection, when a fee waiver is real. A blank-slate prompt doesn't carry any of that — so it produces a letter that reads well and fails procedurally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limits
&lt;/h2&gt;

&lt;p&gt;It is not legal advice, and it does not know your specific agency's quirks. Statutes change; local rules vary. The skill gives you a strong, correctly-structured draft — you still confirm the citation for your jurisdiction and adjust for the body you're filing against. It will also tell you when your request is too vague to file, instead of inventing specifics.&lt;/p&gt;

&lt;p&gt;The whole library is free. If you file records requests, start with the FOIA skill and the document-analysis skill that pairs with it — feed in what comes back, get a structured read of what's in it and what's missing.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/ur-grue/autopunk-media-skills" rel="noopener noreferrer"&gt;github.com/ur-grue/autopunk-media-skills&lt;/a&gt;&lt;/p&gt;

</description>
      <category>journalism</category>
      <category>ai</category>
      <category>foia</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How do you eval LLM output that isn't code?</title>
      <dc:creator>ur-grue</dc:creator>
      <pubDate>Fri, 29 May 2026 10:44:27 +0000</pubDate>
      <link>https://dev.to/urgrue/how-do-you-eval-llm-output-that-isnt-code-1gp8</link>
      <guid>https://dev.to/urgrue/how-do-you-eval-llm-output-that-isnt-code-1gp8</guid>
      <description>&lt;p&gt;Code has a luxury: it either runs or it doesn't. You write an assertion, you run it, you get a green check or a red one. Most LLM eval frameworks lean on exactly this — &lt;code&gt;assert output contains X&lt;/code&gt;, &lt;code&gt;assert valid JSON&lt;/code&gt;, &lt;code&gt;assert no error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Editorial output has no such luxury. A pitch treatment, a show-notes draft, a lede — there's no test that returns true for "a working producer would send this." So how do you evaluate it without a human reading every output, every time?&lt;/p&gt;

&lt;p&gt;I had to answer this concretely, because I maintain a library of ~400 Claude skills for media work, and "trust me, it's good" is not a quality bar. Here's the approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two stages: binary first, judgment second
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stage one is the cheap filter — binary assertions.&lt;/strong&gt; Even for prose, a lot of failure is mechanical and testable: did it produce the required sections? Is the lede one sentence? Did it refuse to fabricate a quote when given no source? These catch the obvious breaks fast, run blind across many inputs, and cost nothing. The library runs thousands of these. The interesting result: the few "failures" were skills correctly &lt;em&gt;refusing&lt;/em&gt; to invent content on deliberately thin inputs — the desired behaviour, not a bug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage two is the part that matters for prose — graded judgment.&lt;/strong&gt; A model scores each output 1–5 across seven dimensions: coherence, relevance, accuracy, completeness, usefulness, format-fit, and one more that does the real work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dimension that does the work: Editorial Naturalness
&lt;/h2&gt;

&lt;p&gt;Six of the seven dimensions are standard. The seventh is a hard floor: does this read like a person who knows the medium, or like a model?&lt;/p&gt;

&lt;p&gt;This is scored against observable tells, not vibes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lexical&lt;/strong&gt; — the AI vocabulary (delve, leverage, robust, seamless, tapestry).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structural&lt;/strong&gt; — the false pivot ("not just X, but Y"), throat-clearing openers, rule-of-three on every line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tonal&lt;/strong&gt; — manufactured enthusiasm, hedging stacks, the apology spiral.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Genre&lt;/strong&gt; — does it honour the conventions of the format it claims to serve?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A skill can score 5/5 on the other six and still fail. If Editorial Naturalness is below the floor, it doesn't ship. That single constraint is what stops the library drifting into competent-sounding slop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a hard floor and not an average
&lt;/h2&gt;

&lt;p&gt;Averages hide the thing you care about. A draft that's accurate, complete, well-structured — and unmistakably machine-written — would pass an averaged score comfortably. For media work that draft is useless: the audience clocks it in a sentence. The floor forces the failure to surface instead of being averaged away.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limitation
&lt;/h2&gt;

&lt;p&gt;A model grading prose is generous — it tends to like fluent text, including fluent AI text. So the scores are treated as a filter, not a verdict: they catch the clear failures and rank candidates, but the bar for "stable" is deliberately set high (≥ 4.0 with the naturalness floor), and the rubric is anchored on &lt;em&gt;observable&lt;/em&gt; tells rather than taste, so two runs roughly agree. It's not perfect. It's a lot better than shipping on feel.&lt;/p&gt;

&lt;p&gt;The whole framework — dimensions, thresholds, the banned-phrase list — is open source. If you're evaluating non-code LLM output, take it apart and tell me where it's too soft.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/ur-grue/autopunk-media-skills" rel="noopener noreferrer"&gt;github.com/ur-grue/autopunk-media-skills&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>evaluation</category>
      <category>writing</category>
    </item>
    <item>
      <title>How to tell if AI wrote it — and how to make it stop</title>
      <dc:creator>ur-grue</dc:creator>
      <pubDate>Fri, 29 May 2026 10:10:25 +0000</pubDate>
      <link>https://dev.to/urgrue/how-to-tell-if-ai-wrote-it-and-how-to-make-it-stop-4c9m</link>
      <guid>https://dev.to/urgrue/how-to-tell-if-ai-wrote-it-and-how-to-make-it-stop-4c9m</guid>
      <description>&lt;p&gt;You can spot it in a sentence. "In today's fast-paced landscape, we're thrilled to delve into game-changing solutions that empower teams to unlock their potential."&lt;/p&gt;

&lt;p&gt;Grammatically fine. Recognisably synthetic. And for anyone whose readers notice — journalists, producers, newsletter writers — that register is a liability.&lt;/p&gt;

&lt;p&gt;I build a free, MIT-licensed library of Claude skills for media producers. The whole project rests on one idea: AI output for media has to read like a person who knows the format, not like a chatbot. So I had to make "sounds like a human" something you can actually measure, not just feel.&lt;/p&gt;

&lt;p&gt;Here is how.&lt;/p&gt;

&lt;h2&gt;
  
  
  Name the tells
&lt;/h2&gt;

&lt;p&gt;Most AI prose fails in four observable ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lexical&lt;/strong&gt; — a vocabulary of words real writers rarely reach for: delve, leverage, robust, seamless, cutting-edge, tapestry, realm, navigate, unlock, empower.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structural&lt;/strong&gt; — the false pivot ("not just X, but Y"), the throat-clearing opener ("In this article, we will"), the rule-of-three rhythm on every sentence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tonal&lt;/strong&gt; — manufactured enthusiasm, hedging stacks ("may potentially"), the customer-service apology spiral.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Genre&lt;/strong&gt; — copy that ignores the conventions of the medium it claims to serve. A show-notes draft that reads like a press release. A lede that buries the news.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you can name the tell, you can cut it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Make it a quality gate, not a vibe
&lt;/h2&gt;

&lt;p&gt;In the skill library, every skill is scored on a seven-dimension rubric before it ships. Six dimensions are the usual suspects — coherence, relevance, accuracy, and so on. The seventh is a hard floor: &lt;strong&gt;Editorial Naturalness&lt;/strong&gt;. A skill can score well everywhere else and still fail if its output reads as machine-made. Below the floor, it does not ship.&lt;/p&gt;

&lt;p&gt;That one constraint changes how the skills are written. They stop optimising for "complete and correct" and start optimising for "a working professional would send this without rewriting it."&lt;/p&gt;

&lt;h2&gt;
  
  
  The detox pass
&lt;/h2&gt;

&lt;p&gt;There is also a skill whose only job is to strip the tells: feed it AI-flavoured copy, get back a publishable rewrite plus a before/after table so you learn the pattern. It carries the canonical banned-phrase list — the same list a runtime check uses to flag drafts before they go out.&lt;/p&gt;

&lt;p&gt;Worked example. Before:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In today's fast-paced media landscape, our cutting-edge AI-powered solution seamlessly empowers newsrooms to unlock unprecedented efficiencies.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Three regional dailies now use it for pre-publication checks. None of them call the change transformative. All three call it removing a class of small, repetitive edits from the subbing pass.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Shorter. Concrete. Verifiable. No tells.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for media work
&lt;/h2&gt;

&lt;p&gt;Readers in media are trained to detect manufactured language — it is the job. Copy that pattern-matches to "AI wrote this" costs you credibility before the argument even lands. The fix is not "write a better prompt." It is treating naturalness as a measurable bar and refusing to ship below it.&lt;/p&gt;

&lt;p&gt;The library is free and open source. Browse it, copy a skill into Claude, and watch what changes when the AI tells are gone.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/ur-grue/autopunk-media-skills" rel="noopener noreferrer"&gt;github.com/ur-grue/autopunk-media-skills&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>writing</category>
      <category>media</category>
      <category>journalism</category>
    </item>
  </channel>
</rss>
