<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CopperSunDev</title>
    <description>The latest articles on DEV Community by CopperSunDev (@coppersundev).</description>
    <link>https://dev.to/coppersundev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3659025%2F67b7af33-5040-4848-9b99-f2b9ccf2e6c3.png</url>
      <title>DEV Community: CopperSunDev</title>
      <link>https://dev.to/coppersundev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/coppersundev"/>
    <language>en</language>
    <item>
      <title>GEO: What the Research Says About AI Search</title>
      <dc:creator>CopperSunDev</dc:creator>
      <pubDate>Thu, 25 Jun 2026 23:40:33 +0000</pubDate>
      <link>https://dev.to/coppersundev/geo-what-the-research-says-about-ai-search-3f9a</link>
      <guid>https://dev.to/coppersundev/geo-what-the-research-says-about-ai-search-3f9a</guid>
      <description>&lt;p&gt;Three independent research streams — academic, practitioner, and platform — have studied how AI search systems choose which websites to cite. Their findings overlap more than they disagree, and the conclusions challenge several popular assumptions about how to optimize for AI.&lt;/p&gt;

&lt;p&gt;This post synthesizes the full research landscape. If you want to skip straight to implementation, see the &lt;a href="https://brass-seo.com/generative-engine-optimization" rel="noopener noreferrer"&gt;GEO guide&lt;/a&gt; for the complete framework. If you want the deep dive on one specific finding, &lt;a href="https://brass-seo.com/blog/answer-capsules-content-trait-llms-cite-most" rel="noopener noreferrer"&gt;Part 1 of our LLM Visibility series&lt;/a&gt; covers answer capsules in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Navigation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What GEO Research Exists Today&lt;/li&gt;
&lt;li&gt;The Princeton GEO Study&lt;/li&gt;
&lt;li&gt;The Answer Capsule Research&lt;/li&gt;
&lt;li&gt;Platform-Specific Citation Behavior&lt;/li&gt;
&lt;li&gt;What the Research Agrees On&lt;/li&gt;
&lt;li&gt;What the Research Disagrees On&lt;/li&gt;
&lt;li&gt;What This Means for Small Businesses&lt;/li&gt;
&lt;li&gt;GEO Glossary&lt;/li&gt;
&lt;li&gt;Frequently Asked Questions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What GEO Research Exists Today
&lt;/h2&gt;

&lt;p&gt;GEO research falls into three streams, each approaching the problem from a different angle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Academic research&lt;/strong&gt; treats GEO as a formal optimization problem. Researchers run controlled experiments where they modify content using specific strategies and measure whether AI systems cite it more or less often. The Princeton GEO study (Aggarwal et al., 2024) is the most cited example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practitioner research&lt;/strong&gt; studies what already works in the wild. Instead of experimenting with modifications, researchers analyze pages that AI systems are already citing and look for shared structural traits. Adam Gnuse's answer capsule research (published via Search Engine Land, 2025) is the primary example, analyzing 15 domains and 7,500 ChatGPT referral sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform research&lt;/strong&gt; examines how different AI systems behave when selecting sources. This includes tracking studies like Paul DeMott's work on measuring LLM visibility (also published via Search Engine Land) and Authoritas's data on citation patterns across platforms.&lt;/p&gt;

&lt;p&gt;None of these streams alone tells the complete story. Together, they form a surprisingly consistent picture.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Princeton GEO Study
&lt;/h2&gt;

&lt;p&gt;The Princeton study (Aggarwal et al., 2024) was the first academic paper to define "Generative Engine Optimization" as a discipline. The researchers tested nine content optimization strategies and measured their effect on AI citation rates.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Nine Strategies Tested
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;What They Did&lt;/th&gt;
&lt;th&gt;Citation Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cite Sources&lt;/td&gt;
&lt;td&gt;Added authoritative citations&lt;/td&gt;
&lt;td&gt;+30-40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add Statistics&lt;/td&gt;
&lt;td&gt;Included specific data points&lt;/td&gt;
&lt;td&gt;+30-40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expert Quotes&lt;/td&gt;
&lt;td&gt;Added quotations from experts&lt;/td&gt;
&lt;td&gt;+41%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fluency Optimization&lt;/td&gt;
&lt;td&gt;Improved writing quality alone&lt;/td&gt;
&lt;td&gt;Negligible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical Terms&lt;/td&gt;
&lt;td&gt;Added domain-specific terminology&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authoritative Tone&lt;/td&gt;
&lt;td&gt;Rewrote in authoritative voice&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unique Wording&lt;/td&gt;
&lt;td&gt;Used distinctive phrasing&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Easy to Understand&lt;/td&gt;
&lt;td&gt;Simplified language&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyword Stuffing&lt;/td&gt;
&lt;td&gt;Added extra keywords&lt;/td&gt;
&lt;td&gt;Negative&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Findings
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expert quotes had the single highest impact&lt;/strong&gt; at +41% citation improvement. This was the most surprising result — simply adding a relevant expert quotation to content made AI systems significantly more likely to cite it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Citations and statistics tied for second&lt;/strong&gt; at +30-40%. Content that included specific data points or referenced authoritative sources saw consistent citation improvements across AI platforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fluency alone did not help.&lt;/strong&gt; Improving writing quality without changing substance had negligible impact. This is a critical finding because many GEO guides recommend "writing better" as a primary strategy. The Princeton data says better writing only helps when combined with structural changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keyword stuffing actively hurt.&lt;/strong&gt; Adding extra keywords to content reduced citation rates, suggesting AI systems can detect and penalize low-quality optimization attempts.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Means
&lt;/h3&gt;

&lt;p&gt;The Princeton research establishes that GEO is real and measurable. Content modifications can meaningfully change whether AI systems cite you. But the modifications that work are substantive (adding data, quotes, citations) rather than cosmetic (rewriting for fluency, adding keywords).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Answer Capsule Research
&lt;/h2&gt;

&lt;p&gt;Adam Gnuse's practitioner research (Search Engine Land, 2025) analyzed pages that ChatGPT was already citing and identified structural patterns that the cited pages shared.&lt;/p&gt;

&lt;p&gt;The core findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;72.4%&lt;/strong&gt; of blog posts cited by ChatGPT contained answer capsules — concise, self-contained explanations placed immediately after headings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;91%&lt;/strong&gt; of cited passages contained no outbound links within the capsule text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;52.2%&lt;/strong&gt; of cited posts contained original data (proprietary statistics, original research, first-party case studies)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;38%&lt;/strong&gt; of pages ranking on Google's first page received zero AI referral traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a full breakdown of answer capsules — what they are, how to write them, and examples — see &lt;a href="https://brass-seo.com/blog/answer-capsules-content-trait-llms-cite-most" rel="noopener noreferrer"&gt;Part 1: Answer Capsules&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How This Connects to Princeton
&lt;/h3&gt;

&lt;p&gt;The Gnuse findings and Princeton findings are complementary. Princeton tested &lt;em&gt;modifications&lt;/em&gt; and found that adding statistics and expert quotes improved citations. Gnuse studied &lt;em&gt;existing cited content&lt;/em&gt; and found that cited pages already contained original data and self-contained answer passages.&lt;/p&gt;

&lt;p&gt;Both point to the same conclusion: AI systems prefer content that provides complete, extractable answers with supporting evidence.&lt;/p&gt;




&lt;h2&gt;
  
  
  Platform-Specific Citation Behavior
&lt;/h2&gt;

&lt;p&gt;Different AI platforms choose sources differently. Understanding these differences matters because a page optimized for one platform may not perform identically on another.&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT
&lt;/h3&gt;

&lt;p&gt;ChatGPT draws from its training data and, when browsing is enabled, from real-time web searches. Its citation behavior tends to favor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pages with clear, extractable passages&lt;/li&gt;
&lt;li&gt;Content with original data or unique framing&lt;/li&gt;
&lt;li&gt;Established domains — but domain authority is not the only factor&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Perplexity
&lt;/h3&gt;

&lt;p&gt;Perplexity searches the web in real time for every query and cites sources prominently. It tends to cite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple sources per response (typically 5-10 citations)&lt;/li&gt;
&lt;li&gt;Pages that rank well in traditional search&lt;/li&gt;
&lt;li&gt;Content with specific, quotable statements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Google AI Overviews
&lt;/h3&gt;

&lt;p&gt;Google AI Overviews draw primarily from Google's existing search index. Pages that rank well in traditional search results are more likely to be cited in AI Overviews. This creates the strongest overlap between traditional SEO and GEO — &lt;a href="https://www.authoritas.com/blog/ai-overviews-impact-seo" rel="noopener noreferrer"&gt;Authoritas research found 62% overlap&lt;/a&gt; between organic rankings and AI Overview citations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude
&lt;/h3&gt;

&lt;p&gt;Claude uses search-augmented generation when connected to web search tools. Its citation patterns are similar to ChatGPT's browsing mode — structured content with clear answers tends to be cited more frequently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Practical Implication
&lt;/h3&gt;

&lt;p&gt;There is no single "AI search algorithm" to optimize for. Each platform has different citation behaviors. But the structural principles (clear answers, original data, extractable passages) work across all of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Research Agrees On
&lt;/h2&gt;

&lt;p&gt;Despite different methodologies and data sources, the three research streams converge on several points.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structure Matters More Than Authority
&lt;/h3&gt;

&lt;p&gt;Princeton found citation improvements of +30-40% regardless of site authority. Gnuse found that cited content shared structural traits (answer capsules, link-free formatting) independent of domain strength. This does not mean authority is irrelevant — it means formatting and substance can overcome authority gaps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-Contained Answers Win
&lt;/h3&gt;

&lt;p&gt;AI systems need to extract a passage and present it as part of a response. Content that provides complete, self-contained answers within a few sentences is easier for AI to cite than content that spreads an answer across multiple paragraphs or requires clicking through to understand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Original Data Is a Moat
&lt;/h3&gt;

&lt;p&gt;Both Princeton (+30-40% for statistics) and Gnuse (52.2% of cited posts had original data) found that original data significantly increases citation likelihood. This makes intuitive sense — AI cannot fabricate your proprietary data, so it must cite you as the source.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cosmetic Changes Are Not Enough
&lt;/h3&gt;

&lt;p&gt;Princeton's finding that fluency optimization alone had negligible impact is important. GEO is not about "writing better" in a general sense. It is about structuring content so AI can extract, attribute, and present it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Research Disagrees On
&lt;/h2&gt;

&lt;p&gt;The research is not unanimous on everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Much Authority Matters
&lt;/h3&gt;

&lt;p&gt;Princeton suggests structure can partially overcome authority gaps. Authoritas data shows that pages already ranking well in traditional search get cited more in AI Overviews (62% overlap). The truth is likely both — authority helps, but it is not the only factor, and structural optimization can meaningfully improve citation rates for lower-authority sites.&lt;/p&gt;

&lt;h3&gt;
  
  
  Whether llms.txt Files Help
&lt;/h3&gt;

&lt;p&gt;The llms.txt specification allows websites to provide machine-readable summaries for AI systems. No published research has measured whether having an llms.txt file directly increases citations. Some practitioners recommend it as part of a GEO strategy; others consider it unproven. We include it in the &lt;a href="https://brass-seo.com/blog/ai-citability-how-brass-seo-helps" rel="noopener noreferrer"&gt;9-factor audit&lt;/a&gt; but weight it as the lowest-priority factor.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Quickly GEO Changes Take Effect
&lt;/h3&gt;

&lt;p&gt;AI platforms update their indexes on different schedules. ChatGPT's training data updates periodically (months). Perplexity searches live (immediately). Google AI Overviews reflect index changes (days to weeks). No research has established a reliable timeline for when content modifications translate into citation changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Small Businesses
&lt;/h2&gt;

&lt;p&gt;The research consistently shows that GEO favors substance and structure over raw authority. This is good news for small businesses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Format and substance can overcome backlink profiles.&lt;/strong&gt; A small business with a well-structured page containing original data can get cited over a large competitor with a poorly structured page. Princeton measured this directly — citation improvements were driven by content modifications, not site authority.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The investment is content, not tools.&lt;/strong&gt; The most effective GEO strategies (answer capsules, original data, expert quotes) require content work, not expensive software. You need to know &lt;em&gt;what&lt;/em&gt; to fix, and then you need to fix it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional SEO still matters.&lt;/strong&gt; GEO is additive, not a replacement. Pages that rank well in traditional search are more likely to be cited in AI Overviews (62% overlap). The best strategy is solid traditional SEO plus GEO-specific structural optimization.&lt;/p&gt;

&lt;p&gt;For the complete GEO implementation framework, see the &lt;a href="https://brass-seo.com/generative-engine-optimization" rel="noopener noreferrer"&gt;Generative Engine Optimization guide&lt;/a&gt;. To audit your own pages, see &lt;a href="https://brass-seo.com/blog/ai-citability-how-brass-seo-helps" rel="noopener noreferrer"&gt;Part 3: How to Audit Any Page for AI Citability&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  GEO Glossary
&lt;/h2&gt;

&lt;p&gt;The industry has not settled on a single term for optimizing content for AI search. Here are the key terms and how they relate to each other.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Full Name&lt;/th&gt;
&lt;th&gt;What It Means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GEO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generative Engine Optimization&lt;/td&gt;
&lt;td&gt;Optimizing content so generative AI systems (ChatGPT, Perplexity, Claude) cite it. Coined by Princeton researchers (Aggarwal et al., 2024).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AEO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Answer Engine Optimization&lt;/td&gt;
&lt;td&gt;Broader term that includes GEO plus traditional answer features like featured snippets and Google AI Overviews.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Visibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large Language Model Visibility&lt;/td&gt;
&lt;td&gt;Whether your content appears in LLM-generated responses. Used interchangeably with GEO in practitioner contexts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Citability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;How likely a page is to be cited by AI systems, based on structural and content factors. What Brass-SEO audits measure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Overviews&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google AI Overviews&lt;/td&gt;
&lt;td&gt;AI-generated answer summaries shown at the top of Google search results. Draws from Google's existing search index.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Answer Capsule&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;A self-contained explanation (1-2 sentences) placed immediately after a heading. The most common structural trait of AI-cited content (72.4%, Gnuse research).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Umbrella term for any search experience powered by AI, including ChatGPT, Perplexity, Google AI Overviews, and Claude.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zero-Click Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;A search where the user gets their answer directly in the results without clicking through to a website. AI Overviews are a type of zero-click search.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where can I read the original Princeton GEO study?
&lt;/h3&gt;

&lt;p&gt;The paper is "GEO: Generative Engine Optimization" by Aggarwal et al. (2024). It was published through Georgia Tech and Princeton researchers and is available on arXiv. Search for "GEO Generative Engine Optimization Aggarwal" to find the full paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  How reliable is the answer capsule research?
&lt;/h3&gt;

&lt;p&gt;The Gnuse research analyzed 15 domains and 7,500 ChatGPT referral sessions. It was published via Search Engine Land, a major industry publication. The sample size is meaningful for practitioner research, though it focused specifically on ChatGPT citations and may not perfectly generalize to other platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this research apply to all industries?
&lt;/h3&gt;

&lt;p&gt;The Princeton study tested across multiple topic categories. The Gnuse research covered 15 different domains. While no research is perfectly universal, the structural principles (clear answers, original data, expert quotes) apply broadly because they address how AI systems work, not industry-specific content preferences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is GEO research still evolving?
&lt;/h3&gt;

&lt;p&gt;Yes. GEO as a discipline is less than two years old. The research base is growing but still limited compared to traditional SEO research, which spans decades. The findings summarized here represent the best available evidence, but new research may refine or update these conclusions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Brass-SEO use this research?
&lt;/h3&gt;

&lt;p&gt;Brass-SEO's &lt;a href="https://brass-seo.com/blog/ai-citability-how-brass-seo-helps" rel="noopener noreferrer"&gt;AI Citability audit&lt;/a&gt; is built on the 9-factor framework derived from this research. The audit checks each page for answer capsules, original data, expert quotes, and the other factors identified by Princeton and Gnuse. See the &lt;a href="https://brass-seo.com/generative-engine-optimization" rel="noopener noreferrer"&gt;GEO guide&lt;/a&gt; for the full framework.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>seo</category>
    </item>
    <item>
      <title>We Benchmarked BrassCoders Against a Frontier Model</title>
      <dc:creator>CopperSunDev</dc:creator>
      <pubDate>Thu, 25 Jun 2026 07:00:00 +0000</pubDate>
      <link>https://dev.to/coppersundev/we-benchmarked-brasscoders-against-a-frontier-model-en</link>
      <guid>https://dev.to/coppersundev/we-benchmarked-brasscoders-against-a-frontier-model-en</guid>
      <description>&lt;p&gt;A frontier model asked to review AI-generated Python caught 12 of 12 planted bugs. BrassCoders, the scanner that catches what AI assistants structurally miss, caught 11. Bandit caught 6. Pylint caught 1. These are real numbers from a committed, reproducible benchmark run on June 13, 2026 against BrassCoders 2.0.8.&lt;/p&gt;

&lt;p&gt;The number that matters isn't 11 against 12. It's the category breakdown, and the four categories where BrassCoders is the only tool that caught anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark Setup
&lt;/h2&gt;

&lt;p&gt;BrassCoders's AI-coder-bug benchmark is an open dataset of 12 AI-generated Python files, each carrying one planted bug, plus two clean controls. Every file ships a &lt;code&gt;PROVENANCE&lt;/code&gt; docstring naming the prompt that produced it. The corpus, manifest, and runner live in the OSS repo at &lt;code&gt;github.com/CopperSunDev/brasscoders&lt;/code&gt; under &lt;code&gt;cli/docs/benchmarks/ai-coder-bugs/&lt;/code&gt;, Apache 2.0 licensed and reproducible by anyone.&lt;/p&gt;

&lt;p&gt;Scoring is per-category catch-rate. A tool catches a bug if it emits any finding for that file that maps to the bug's category through pre-registered signal patterns in the runner. Those patterns were committed before any run took place, so they weren't tuned to produce a flattering result. The runner is 600 lines. All of it open.&lt;/p&gt;

&lt;p&gt;Four tools ran: BrassCoders 2.0.8, Bandit 1.8.3 (the version BrassCoders bundles internally), Pylint, and Claude sonnet-4-6, the last reached through the Anthropic API with a realistic pre-merge review prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result That Matters: The Wedge Is Real
&lt;/h2&gt;

&lt;p&gt;BrassCoders caught all four AI-coder performance anti-patterns, the category every other static analyzer scored zero on. Bandit caught none. Pylint caught none. Semgrep caught none.&lt;/p&gt;

&lt;p&gt;The four bugs in the wedge category:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;O(N²) string concatenation&lt;/strong&gt;: a &lt;code&gt;csv +=&lt;/code&gt; loop that rebuilds the whole string on every row. Bandit 0, Pylint 0, ask-model 1, BrassCoders 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;list.insert(0)&lt;/code&gt; in a loop&lt;/strong&gt;: prepend-by-shifting the whole list on every insert. Bandit 0, Pylint 0, ask-model 1, BrassCoders 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Triple-nested loop for a join&lt;/strong&gt;: three nested &lt;code&gt;for&lt;/code&gt; loops over customers × orders × items where a dict-lookup would be O(N). Bandit 0, Pylint 0, ask-model 1, BrassCoders 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unbounded &lt;code&gt;while True&lt;/code&gt;&lt;/strong&gt;: a socket-drain loop with no break, no timeout, no size cap. Bandit 0, Pylint 0, ask-model 1, BrassCoders 1.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern holds. These are the bugs AI coding assistants introduce because the prompt described the happy path. The model wrote idiomatic-looking code that passes a small unit test. The bug surfaces only at volume. No generic security rule fires. No type error exists. A linter built around security rules and style checks has no mechanism to flag them.&lt;/p&gt;

&lt;p&gt;BrassCoders carries four AST-level rules that match these anti-patterns directly: string concatenation in a loop, &lt;code&gt;list.insert(0)&lt;/code&gt; in a loop, nesting deeper than a threshold, and &lt;code&gt;while True&lt;/code&gt; with no exit. The rules are dumb. The detection is reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Model Wins On
&lt;/h2&gt;

&lt;p&gt;BrassCoders missed one bug, and Claude sonnet-4-6 caught it: &lt;code&gt;sum(readings) / len(readings)&lt;/code&gt; with no empty-list guard, a ZeroDivisionError on &lt;code&gt;[]&lt;/code&gt; that leaves no structural marker for a rule to match.&lt;/p&gt;

&lt;p&gt;The model also caught every security category. SQL injection through an f-string. Command injection through &lt;code&gt;subprocess shell=True&lt;/code&gt;. XSS through &lt;code&gt;autoescape=False&lt;/code&gt;. Unsafe pickle deserialization. BrassCoders caught these too, but the model would have caught them without any rules, because it reasons about intent. That's the difference.&lt;/p&gt;

&lt;p&gt;The overall table, from real tool output on the committed corpus:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Caught&lt;/th&gt;
&lt;th&gt;Catch Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BrassCoders 2.0.8&lt;/td&gt;
&lt;td&gt;11 / 12&lt;/td&gt;
&lt;td&gt;92%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude sonnet-4-6&lt;/td&gt;
&lt;td&gt;12 / 12&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bandit 1.8.3&lt;/td&gt;
&lt;td&gt;6 / 12&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pylint&lt;/td&gt;
&lt;td&gt;1 / 12&lt;/td&gt;
&lt;td&gt;8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Zero false positives on either clean control, across every tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Generation-Mode Finding
&lt;/h2&gt;

&lt;p&gt;BrassCoders also ran a generation-mode probe: six neutral coding tasks with no mention of bugs or performance, asking the model to write each from scratch, then scanning the output and asking the model to self-review.&lt;/p&gt;

&lt;p&gt;The model wrote clean code on four of five wedge tasks. Claude sonnet-4-6 in June 2026 reaches for &lt;code&gt;io.StringIO&lt;/code&gt; instead of string concatenation, guards the empty-list case on its own, and merges dicts with &lt;code&gt;{**a, **b}&lt;/code&gt;. The model has improved since the first benchmark version ran in early June.&lt;/p&gt;

&lt;p&gt;On the one task where the model did introduce a bug, an &lt;code&gt;insert(0)&lt;/code&gt; loop on the recent-feed task, both BrassCoders and the model's self-review caught it.&lt;/p&gt;

&lt;p&gt;The more telling number: the model issued &lt;strong&gt;zero proactive warnings&lt;/strong&gt; while writing the code. It wrote the &lt;code&gt;insert(0)&lt;/code&gt; loop and moved on. It flagged the problem only when asked to review. A gate that needs an explicit invocation isn't a gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Axes BrassCoders Wins That Don't Show in a Catch-Rate Table
&lt;/h2&gt;

&lt;p&gt;BrassCoders ran the same 12 files three times and produced identical results every time. That repeatability is what makes it a gate, and it's the axis a catch-rate table never measures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Determinism.&lt;/strong&gt; The model's review of the same 12 files returns different output run to run, because LLMs sample non-deterministically. A CI gate that returns different verdicts on the same commit is noise, not signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost per run.&lt;/strong&gt; The BrassCoders OSS core costs nothing to run. No API calls, no token charges. At 50 commits per developer per month and a 500-line average diff, a model review adds $0.15 to $0.25 per developer per month at current API pricing. The dollar figure isn't the real point. A $0 scan is a scan nobody has to approve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bytes sent off-machine.&lt;/strong&gt; Running &lt;code&gt;brasscoders --offline scan&lt;/code&gt; sends zero bytes off the machine. A model review sends the full source to an external API. For any codebase with data-handling rules, only BrassCoders can run on every commit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI automatability.&lt;/strong&gt; BrassCoders runs as a standard command in a GitHub Actions step. The model review runs when a developer remembers to ask. Both help. Only one runs automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Reproduce
&lt;/h2&gt;

&lt;p&gt;BrassCoders ships the full benchmark corpus in its OSS repo, so the static-tool numbers reproduce exactly. Clone the repo, install the tools, and run the runner with &lt;code&gt;--no-ask-model&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/CopperSunDev/brasscoders
pip &lt;span class="nb"&gt;install &lt;/span&gt;brasscoders bandit pylint pyyaml
python cli/docs/benchmarks/ai-coder-bugs/run_benchmark.py &lt;span class="nt"&gt;--no-ask-model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output should match the committed &lt;code&gt;results/RESULTS.md&lt;/code&gt; within the margin of your installed BrassCoders version. To reproduce the ask-model column, set &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; and drop &lt;code&gt;--no-ask-model&lt;/code&gt;. Model results vary a little because LLM sampling is non-deterministic, but the overall catch rate has held steady across runs.&lt;/p&gt;

&lt;p&gt;The benchmark is the dataset. The runner is the scoring logic. Both are open, Apache 2.0 licensed, and built to extend. Want a new bug category? Add the pattern to the manifest and commit its signal patterns to &lt;code&gt;CATEGORY_SIGNALS&lt;/code&gt; before you run.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Numbers Say
&lt;/h2&gt;

&lt;p&gt;BrassCoders isn't the smarter reviewer. Claude sonnet-4-6 is smarter, and it caught the one bug that needs reasoning instead of rules. BrassCoders is the deterministic gate: the same scan and the same findings on every commit, in CI, for free, with nothing sent off-machine.&lt;/p&gt;

&lt;p&gt;Those are the categories it wins. The benchmark shows both sides.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;brasscoders
brasscoders &lt;span class="nt"&gt;--offline&lt;/span&gt; scan /path/to/your/project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>benchmarks</category>
      <category>opensource</category>
      <category>codereview</category>
    </item>
    <item>
      <title>I built 90+ AI prompts because raw transcripts are useless</title>
      <dc:creator>CopperSunDev</dc:creator>
      <pubDate>Thu, 18 Dec 2025 00:29:25 +0000</pubDate>
      <link>https://dev.to/coppersundev/i-built-90-ai-prompts-because-raw-transcripts-are-useless-4mbh</link>
      <guid>https://dev.to/coppersundev/i-built-90-ai-prompts-because-raw-transcripts-are-useless-4mbh</guid>
      <description>&lt;p&gt;A few weeks ago I posted about building a transcription tool. The responses were helpful. A few people asked about what I do with the transcripts after.&lt;/p&gt;

&lt;p&gt;Honest answer: for a while, not much.&lt;br&gt;
I'd get this wall of text with speaker labels and timestamps, and then... stare at it. The transcription part worked. But a raw transcript is like having all the ingredients dumped on your counter. You still have to cook.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem I kept running into
&lt;/h2&gt;

&lt;p&gt;I do interviews for work. Marketing stuff mostly. The goal is usually to turn a 45-minute conversation into something publishable—a blog post, social clips, whatever.&lt;/p&gt;

&lt;p&gt;So I'd paste the transcript into Claude or ChatGPT and say something like "turn this into a blog post."&lt;/p&gt;

&lt;p&gt;The output was... fine? Generic. It would summarize instead of pulling actual quotes. It'd lose the person's voice. I'd spend an hour fixing it and think "I could've just written this myself."&lt;/p&gt;

&lt;p&gt;Same thing with meeting notes. "Summarize this meeting" gets you a summary. But what I actually needed was: what did we decide, who's doing what, and what's the follow-up. Different problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I started building prompts
&lt;/h2&gt;

&lt;p&gt;Not because I planned to. I just kept tweaking the same prompts over and over until they actually worked.&lt;/p&gt;

&lt;p&gt;The blog post one took the longest. I needed it to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep the interviewee's actual voice (not sanitize everything into corporate speak)&lt;/li&gt;
&lt;li&gt;Pull real quotes, not paraphrase everything
Structure it like a real article, not a book report&lt;/li&gt;
&lt;li&gt;Lead with something interesting, not "In this interview, we discussed..."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That one prompt went through probably 15 versions before it stopped annoying me.&lt;br&gt;
Then I built one for meeting summaries that extracts decisions and action items separately. One for turning podcasts into social posts. One for cleaning up the speaker labels in raw transcripts.&lt;/p&gt;

&lt;p&gt;At some point I looked up and had 90+ of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned about prompting
&lt;/h2&gt;

&lt;p&gt;Most of my early prompts were too vague. "Summarize this" doesn't tell the model what you actually care about.&lt;/p&gt;

&lt;p&gt;The ones that work best are almost annoyingly specific:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What's the exact output format?&lt;/li&gt;
&lt;li&gt;What should it include vs. ignore?&lt;/li&gt;
&lt;li&gt;What tone? What length?&lt;/li&gt;
&lt;li&gt;What questions should it ask me before it starts?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one was a breakthrough. The best prompts don't just run—they clarify first. "Before I process this, tell me: how many speakers, what are their names, what's the context?"&lt;/p&gt;

&lt;p&gt;Turns out you get way better output when the model understands what it's working with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some that ended up being useful
&lt;/h2&gt;

&lt;p&gt;A few I keep coming back to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transcript cleaner — Takes raw output with "Speaker 0" and "Speaker 1" labels and turns it into something readable with real names and proper formatting. Sounds trivial but it's the one I use most.&lt;/li&gt;
&lt;li&gt;Interview → blog post — Extracts the interesting parts of a conversation and structures them into an actual article. Keeps quotes intact. Writes transitions that don't sound like AI wrote them (usually).&lt;/li&gt;
&lt;li&gt;Meeting action items — Pulls out decisions, tasks, and owners from a meeting transcript. Ignores the 40 minutes of small talk to find the 5 things that actually matter.&lt;/li&gt;
&lt;li&gt;Podcast social package — Generates a batch of social posts from an episode transcript. Quote cards, discussion questions, that kind of thing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also built some weird specific ones for legal transcripts (deposition analysis, contradiction detection) that I'm not sure anyone else needs. But they exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where they live now
&lt;/h2&gt;

&lt;p&gt;I put them on GitHub and linked them from the transcription site:&lt;br&gt;
&lt;a href="https://brasstranscripts.com/ai-prompt-guide" rel="noopener noreferrer"&gt;https://brasstranscripts.com/ai-prompt-guide&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They're organized by use case. Some have full write-ups explaining how to use them, others are just the prompt.&lt;/p&gt;

&lt;p&gt;They work with Claude, ChatGPT, Gemini—whatever. The transcript format matters more than which model you use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Still iterating
&lt;/h2&gt;

&lt;p&gt;Some of these are solid. Others I'm still not happy with. The social media ones especially—getting an LLM to write something that doesn't sound like an LLM wrote it is its own challenge.&lt;/p&gt;

&lt;p&gt;If you've built prompts for processing transcripts (or any structured text really), curious what approaches have worked for you. The "ask clarifying questions first" pattern has been the biggest improvement for me, but I'm sure there are techniques I haven't tried.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>whisper</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>I ended up building a transcription tool</title>
      <dc:creator>CopperSunDev</dc:creator>
      <pubDate>Fri, 12 Dec 2025 20:03:59 +0000</pubDate>
      <link>https://dev.to/coppersundev/i-ended-up-building-a-transcription-tool-5920</link>
      <guid>https://dev.to/coppersundev/i-ended-up-building-a-transcription-tool-5920</guid>
      <description>&lt;p&gt;I do a lot of interviews with subject-matter experts for work. Usually it’s over Teams or Zoom. Sometimes the built-in transcript is missing, locked, or just unusable.&lt;/p&gt;

&lt;p&gt;For a while I tried the usual options. Some required subscriptions I didn’t want. Others had weird formatting that meant I spent as much time cleaning up the output as I would have just typing it myself. A few couldn’t handle multiple speakers without turning it into a mess.&lt;/p&gt;

&lt;p&gt;At some point I was messing around with Claude Code and thought: why not just build something myself?&lt;/p&gt;

&lt;p&gt;That turned into a lot of hours, a bunch of blind alleys, and more tweaking than I expected.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;What actually worked&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Speech recognition has gotten surprisingly good in the last few years. The open source options are solid now. Getting accurate text from clear audio isn’t the hard part anymore.&lt;/p&gt;

&lt;p&gt;Speaker diarization was trickier. Figuring out who said what in a conversation is a different problem than just converting speech to text. Getting those two pieces to work together cleanly took more debugging than I’d like to admit.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;The stuff I underestimated&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Audio quality variation.&lt;br&gt;
A clean studio recording and a laptop mic in a conference room are completely different problems. I spent a lot of time on preprocessing that I didn’t plan for.&lt;/p&gt;

&lt;p&gt;**Output formats.&lt;br&gt;
**I originally just wanted plain text. Then I needed SRT for a video project. Then JSON for piping into other tools. Scope creep is real, even on your own projects.&lt;/p&gt;

&lt;p&gt;**Edge cases with speaker detection.&lt;br&gt;
**Two people with similar voices. Someone who talks over someone else. Long pauses where the model isn’t sure if it’s a new speaker or the same person thinking. These are harder than they sound.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Where it’s at now&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Eventually I had something that worked well enough for my own use, so I turned it into a small platform:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://brasstranscripts.com/" rel="noopener noreferrer"&gt;https://brasstranscripts.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It takes audio or video and produces transcripts that are actually usable. Mostly I cared about speaker separation and output that didn’t need a lot of cleanup afterward.&lt;/p&gt;

&lt;p&gt;No subscription — just pay per file. I built it that way because that’s what I wanted as a user. I transcribe maybe five to ten recordings a month. Paying $20/month for that felt wrong.&lt;/p&gt;

&lt;p&gt;I’m sure there are edge cases I haven’t hit yet. I’m still adjusting things as I run into them. The diarization in particular is something I keep tweaking.&lt;/p&gt;

&lt;p&gt;Posting here mostly in case it’s useful to anyone else who runs into the same problem. Or if you’ve built something similar and have thoughts on approaches I should try.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
