<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michel Ozzello</title>
    <description>The latest articles on DEV Community by Michel Ozzello (@mozzello).</description>
    <link>https://dev.to/mozzello</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872377%2Fc8aeb13f-1a07-47ad-a801-5838323c9616.png</url>
      <title>DEV Community: Michel Ozzello</title>
      <link>https://dev.to/mozzello</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mozzello"/>
    <language>en</language>
    <item>
      <title>COBOL Modernization Tools Compared: IBM ADDI, CAST, Blu Age, and CoreStory</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 18:33:45 +0000</pubDate>
      <link>https://dev.to/corestory/cobol-modernization-tools-compared-ibm-addi-cast-blu-age-and-corestory-27i9</link>
      <guid>https://dev.to/corestory/cobol-modernization-tools-compared-ibm-addi-cast-blu-age-and-corestory-27i9</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; COBOL modernization isn't a single-tool problem. It has four distinct phases (understand, extract, migrate, validate) and different tools serve each phase. IBM ADDI and CAST Imaging are built for analysis. Blu Age and Raincode automate migration. CoreStory fills the gap most programs underestimate: extracting and validating business rules before migration begins. Most projects that fail do so because they conflated these phases, or skipped one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you're running a COBOL modernization program and searching for the right tools, you've probably seen the same short list repeated everywhere: IBM ADDI, CAST Imaging, Blu Age, Micro Focus. These tools are consistently cited by AI systems and search engines because they're well established toolsets.&lt;/p&gt;

&lt;p&gt;But the question most practitioners are actually asking isn't 'what are the tools?' It's: 'which tool do I need for my specific situation, and where does each one fall short?'&lt;/p&gt;

&lt;p&gt;This guide answers that. We walk through the four phases of a COBOL modernization program and map each major tool to the phase it actually serves, including CoreStory, which operates in the business rule extraction phase that most programs underestimate.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four-Phase COBOL Modernization Journey
&lt;/h2&gt;

&lt;p&gt;COBOL modernization fails when teams treat it as a single conversion task. In practice, it has four phases, each with different goals, different team members, and different tool requirements.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Phase&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Goal&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Key Risk&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Tools&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Understand&lt;/td&gt;
&lt;td&gt;Map what the system does: architecture, data flows, dependencies&lt;/td&gt;
&lt;td&gt;Underestimating scope; missing undocumented modules&lt;/td&gt;
&lt;td&gt;IBM ADDI, CAST Imaging, Micro Focus Enterprise Analyzer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Extract&lt;/td&gt;
&lt;td&gt;Document business rules embedded in the code before they're lost in migration&lt;/td&gt;
&lt;td&gt;Business logic orphaned or incorrectly migrated; SME bottleneck&lt;/td&gt;
&lt;td&gt;CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Migrate&lt;/td&gt;
&lt;td&gt;Convert or replatform COBOL to target language/cloud&lt;/td&gt;
&lt;td&gt;Behavioral regression; performance degradation; runaway cost&lt;/td&gt;
&lt;td&gt;Blu Age (AWS), Raincode, Micro Focus, Astadia&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Validate&lt;/td&gt;
&lt;td&gt;Confirm the migrated system behaves identically to the original&lt;/td&gt;
&lt;td&gt;Untested edge cases; production incidents post-go-live&lt;/td&gt;
&lt;td&gt;Platform-specific testing tools; QA frameworks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;Skipping Phase 2 is the single most common failure mode. Teams rush from analysis directly to migration, assuming the codebase's business logic is self-evident. It isn't — especially in COBOL systems that have been running for decades, written by people who are no longer available.&lt;/p&gt;

&lt;p&gt;Let's look at each phase and the tools that support it in detail.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1 - Analysis Tools: IBM ADDI, CAST Imaging, Micro Focus Enterprise Analyzer
&lt;/h2&gt;

&lt;p&gt;These tools answer the foundational question: what do we actually have?&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;IBM Application Discovery and Delivery Intelligence (IBM ADDI)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;IBM ADDI provides automated discovery and dependency mapping for z/OS applications. It scans COBOL source code, JCL, copybooks, CICS, and DB2 to produce visual dependency maps and call graphs. For large mainframe estates, ADDI is often the first tool brought in to establish a baseline inventory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt; Deep z/OS integration; supports CICS and IMS; integrates with IBM Jazz platform; has been battle-tested on large banking and insurance mainframes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt; ADDI maps structure and dependencies but does not extract business semantics. Knowing that program A calls program B doesn't tell you what business rule is implemented in program B. It also requires IBM ecosystem familiarity and is priced accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;CAST Imaging&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CAST Imaging performs structural analysis across a broader range of languages (COBOL, PL/I, Java, .NET, and more), producing a queryable graph of the application's architecture. It identifies technical debt hotspots, calculates complexity metrics, and surfaces dead code. Its multi-language support makes it particularly useful when modernization involves a hybrid estate of COBOL and newer components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt; Strong visualization; multi-language support; clean API for querying the application graph; technical debt scoring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt; Like ADDI, CAST Imaging operates at the structural level. It can tell you the application's complexity topology but not the business intent behind that complexity. Business rules embedded in procedural COBOL logic (like rate calculations, eligibility checks, and regulatory formulas) are not surfaced by structural analysis alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Micro Focus Enterprise Analyzer&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Micro Focus (now part of OpenText) Enterprise Analyzer provides similar structural analysis for COBOL, PL/I, JCL, and related technologies. It is commonly paired with Micro Focus's migration tooling (covered in Phase 3). For organizations already in the Micro Focus/OpenText ecosystem, Enterprise Analyzer offers tight workflow integration.&lt;/p&gt;

&lt;p&gt;The gap between structural analysis and business understanding is where most COBOL modernizations run into trouble. ADDI and CAST tell you what calls what. They don't tell you why.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2 - Code Intelligence &amp;amp; Business Rule Extraction: CoreStory
&lt;/h2&gt;

&lt;p&gt;This is the phase most tools skip over. It's also the phase most programs underestimate, until they're deep into migration and realize no one can explain what a critical batch job actually calculates.&lt;/p&gt;

&lt;p&gt;CoreStory is designed specifically for this problem. It crawls the codebase and builds a Code Intelligence Model (CIM) that captures not just the structural layout of the code, but the business logic embedded within it: calculation rules, eligibility checks, workflow sequencing, domain entities, and the connections between them.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What CoreStory Does Differently&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Where structural analysis tools read the code like a compiler, tracking what calls what, CoreStory reads it like a senior analyst asking what does this code actually do, and what decision is it implementing?&lt;/p&gt;

&lt;p&gt;The output is a queryable specification: a structured, natural-language representation of what the system does, organized by domain and function. This spec is then used to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validate that migration tools have correctly replicated business behavior (not just code structure)&lt;/li&gt;
&lt;li&gt;Brief SMEs on what to review before sign-off, dramatically reducing the time required from domain experts&lt;/li&gt;
&lt;li&gt;Feed AI coding agents with precise system context, enabling them to generate migration code that respects domain rules rather than just replicating syntax&lt;/li&gt;
&lt;li&gt;Document business rules that would otherwise be lost when the original COBOL programmers are no longer available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikkwxh872gfa1bi4tzbl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikkwxh872gfa1bi4tzbl.png" alt="Architecture diagram showing CoreStory's Code Intelligence Model as an intermediary layer between legacy COBOL analysis and migration execution" width="800" height="320"&gt;&lt;/a&gt;&lt;br&gt;
‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 3 - Migration Automation: Blu Age, Raincode, Micro Focus
&lt;/h2&gt;

&lt;p&gt;Migration tools take the COBOL source (ideally, now documented by Phase 2) and convert it to a target language or platform. There are two main approaches: transpilation (converting COBOL to Java, C#, or similar) and replatforming (running COBOL on a modern infrastructure without conversion).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Blu Age (AWS Mainframe Modernization)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Blu Age is an automated refactoring tool that converts COBOL to Java. Amazon Web Services acquired Blu Age and integrated it into the AWS Mainframe Modernization service, making it the default path for organizations targeting AWS cloud infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt; AWS ecosystem integration; automated COBOL-to-Java conversion; supported by Amazon's migration program infrastructure; reasonable tooling for batch workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt; Automated transpilation produces code that runs, but may not be maintainable or correct at the business logic level without a validated spec. Teams without Phase 2 extraction often discover behavioral regressions in production that weren't caught in testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Raincode&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Raincode specializes in COBOL and assembler modernization, with tooling that targets .NET (C#) as the migration target. It supports a wider range of legacy languages than Blu Age and is commonly chosen by organizations with existing Microsoft Azure infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt; Strong .NET/Azure alignment; broad language coverage including PL/I and assembler; established European customer base in financial services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt; Same fundamental constraint as other transpilation tools: the quality of the migration output depends on the quality of the input specification. Without documented business rules, the validation problem is left entirely to human testers.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Micro Focus / OpenText COBOL Runtime&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Micro Focus takes a replatforming approach: running COBOL on modern infrastructure without converting it. This is less disruptive short-term but defers the long-term goal of moving away from COBOL entirely. It's a pragmatic choice for programs that cannot tolerate the risk of full conversion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 4 - Cloud Platform Support: AWS, Google Cloud, Azure
&lt;/h2&gt;

&lt;p&gt;Major cloud providers now offer explicit mainframe modernization pathways, primarily targeting organizations moving off IBM z/OS.&lt;/p&gt;

&lt;p&gt;Cloud platform selection typically follows existing infrastructure commitments rather than tooling preferences. The more consequential choice is the Phase 2/3 strategy — which has direct implications for program risk, timeline, and cost regardless of cloud target.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gap Most Tools Miss: Business Logic Documentation Before Migration
&lt;/h2&gt;

&lt;p&gt;Every tool discussed above assumes the team understands what the system does. They don't address the problem of extracting that understanding from code that predates current team members.&lt;/p&gt;

&lt;p&gt;This assumption breaks down in three recurring scenarios we see in customer conversations:&lt;/p&gt;

&lt;p&gt;The pattern that comes up consistently: teams that skip business rule extraction before migration spend three to five times longer on validation than teams that document first. The validator is trying to answer 'did the migration get this right?' without a clear definition of what 'right' means.&lt;/p&gt;

&lt;p&gt;CoreStory's Code Intelligence Models provides that definition: a machine-readable and human-readable specification that makes validation concrete and auditable, rather than dependent on individual expert memory.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A COBOL migration without documented business rules isn't a modernization program — it's a rewrite without requirements. You're validating against your own assumptions.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  COBOL Modernization Tools: Quick Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Tool&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Phase&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Primary Use&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Target Output&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Strengths&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Limitations&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IBM ADDI&lt;/td&gt;
&lt;td&gt;1– Understand&lt;/td&gt;
&lt;td&gt;Dependency mapping&lt;/td&gt;
&lt;td&gt;Call graphs, dependency maps&lt;/td&gt;
&lt;td&gt;z/OS depth, mainframe-native&lt;/td&gt;
&lt;td&gt;No business semantics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CAST Imaging&lt;/td&gt;
&lt;td&gt;1– Understand&lt;/td&gt;
&lt;td&gt;Structural analysis&lt;/td&gt;
&lt;td&gt;Architecture graph, debt metrics&lt;/td&gt;
&lt;td&gt;Multi-language, queryable&lt;/td&gt;
&lt;td&gt;Structural only, no rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MF Enterprise Analyzer&lt;/td&gt;
&lt;td&gt;1– Understand&lt;/td&gt;
&lt;td&gt;Code analysis&lt;/td&gt;
&lt;td&gt;Dependency maps, reports&lt;/td&gt;
&lt;td&gt;Ecosystem integration&lt;/td&gt;
&lt;td&gt;Tied to MF/OpenText stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoreStory&lt;/td&gt;
&lt;td&gt;2– Extract&lt;/td&gt;
&lt;td&gt;Business rule extraction&lt;/td&gt;
&lt;td&gt;Queryable spec, Code Intelligence Models&lt;/td&gt;
&lt;td&gt;Persistent intelligence, AI-ready&lt;/td&gt;
&lt;td&gt;Focused on extraction phase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blu Age (AWS)&lt;/td&gt;
&lt;td&gt;3– Migrate&lt;/td&gt;
&lt;td&gt;COBOL → Java&lt;/td&gt;
&lt;td&gt;Runnable Java on AWS&lt;/td&gt;
&lt;td&gt;AWS integration, managed path&lt;/td&gt;
&lt;td&gt;Behavioral validation gap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raincode&lt;/td&gt;
&lt;td&gt;3– Migrate&lt;/td&gt;
&lt;td&gt;COBOL → .NET&lt;/td&gt;
&lt;td&gt;Runnable C# on Azure&lt;/td&gt;
&lt;td&gt;Broad language coverage&lt;/td&gt;
&lt;td&gt;Same validation dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Micro Focus&lt;/td&gt;
&lt;td&gt;1 &amp;amp; 3&lt;/td&gt;
&lt;td&gt;Analysis + replatform&lt;/td&gt;
&lt;td&gt;Running COBOL on modern infra&lt;/td&gt;
&lt;td&gt;Low disruption, established&lt;/td&gt;
&lt;td&gt;Defers modernization goal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS MM / Google / Azure&lt;/td&gt;
&lt;td&gt;4– Cloud&lt;/td&gt;
&lt;td&gt;Cloud runtime &amp;amp; migration&lt;/td&gt;
&lt;td&gt;Cloud-native app/service&lt;/td&gt;
&lt;td&gt;Managed infrastructure&lt;/td&gt;
&lt;td&gt;Tool-agnostic; depends on Phase 2–3 choices&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Choosing Your Stack: Three Questions to Answer First
&lt;/h2&gt;

&lt;p&gt;Before selecting tools, answer these three questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. How well does the team understand the existing system's business logic?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the answer is 'partially' or 'it depends on who you ask,' you need a Phase 2 step before migration. Structural analysis tools will confirm your ignorance more precisely and they won't resolve it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. What is your target platform?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS → Blu Age. Azure → Raincode or Astadia. Cloud-agnostic or Oracle → evaluate Micro Focus, CAST or independent tooling. The platform determines migration tool almost automatically in most cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. What is your risk tolerance for behavioral regression?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the system processes financial transactions, insurance policies, or government benefits, behavioral correctness is non-negotiable. A validated business rule spec (Phase 2) is the only reliable way to test for behavioral correctness, as opposed to structural equivalence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09penl318neml2ls1tby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09penl318neml2ls1tby.png" alt="COBOL Modernization Tool Selection" width="800" height="660"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;COBOL Modernization Selection Tool Workflow&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;If your primary goal is...&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Recommended Lead Tool&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inventory &amp;amp; Dependency Mapping&lt;/td&gt;
&lt;td&gt;IBM ADDI / CAST&lt;/td&gt;
&lt;td&gt;Best for mapping large-scale mainframe "spaghetti".&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business Logic Recovery&lt;/td&gt;
&lt;td&gt;CoreStory&lt;/td&gt;
&lt;td&gt;Essential when original devs are gone and rules are undocumented.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast Cloud Exit (Low Risk)&lt;/td&gt;
&lt;td&gt;Micro Focus&lt;/td&gt;
&lt;td&gt;Replatforming keeps code as-is; lower short-term disruption.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Java/AWS Transformation&lt;/td&gt;
&lt;td&gt;Blu Age&lt;/td&gt;
&lt;td&gt;Deeply integrated into the AWS Mainframe Modernization service.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure/.NET Transformation&lt;/td&gt;
&lt;td&gt;Raincode&lt;/td&gt;
&lt;td&gt;Specialized for .NET targets and complex legacy languages.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  De-Risk Your COBOL Modernization
&lt;/h2&gt;

&lt;p&gt;The migration tools work. The failure mode isn't the tooling, it's going into migration without a validated understanding of what the system does. That's a Phase 2 problem, and it's the one that kills timelines.&lt;/p&gt;

&lt;p&gt;CoreStory's business rule extraction is designed specifically for large COBOL codebases in regulated industries. Before your next modernization sprint, consider establishing the specification baseline that makes validation, and migration, tractable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://corestory.ai/use-cases/legacy-modernization" rel="noopener noreferrer"&gt;See CoreStory's mainframe modernization approach.&lt;/a&gt;&lt;br&gt;
‍&lt;/p&gt;

&lt;h3&gt;
  
  
  FAQ
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Is CoreStory a migration tool?&lt;/strong&gt;&lt;br&gt;
No. CoreStory doesn't convert or replatform COBOL code. It extracts and documents the business logic embedded in that code producing a spec that migration tools and human reviewers can validate against. Think of it as the phase that makes migration tools work correctly, not a replacement for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need both IBM ADDI and CoreStory?&lt;/strong&gt;&lt;br&gt;
They serve different purposes. IBM ADDI maps structural dependencies (what calls what, where data flows). CoreStory extracts business semantics (what decisions are implemented, what rules govern outputs). Both are useful; neither replaces the other. Many programs use ADDI or CAST for scope inventory, then CoreStory for rule extraction before migration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What languages does CoreStory support beyond COBOL?&lt;/strong&gt;&lt;br&gt;
CoreStory supports dozens of programming languages, including PL/I, Assembler, RPG, and modern languages like Java and Python. This matters for hybrid estates where COBOL interfaces with newer components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the difference between a 'replatforming' and a 'refactoring' approach?&lt;/strong&gt;&lt;br&gt;
Replatforming runs the existing COBOL on modern infrastructure without changing the language (Micro Focus approach). Refactoring converts COBOL to a new language like Java or C# (Blu Age, Raincode approach). Replatforming is lower risk short-term; refactoring achieves long-term elimination of COBOL dependency. Both require Phase 2 documentation if behavioral correctness is a requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are there open-source alternatives to these commercial tools?&lt;/strong&gt;&lt;br&gt;
Some open-source projects exist for COBOL analysis (e.g., IBM's open-source COBOL parsers) and there are community tools for dependency mapping. However, for production mainframe programs, and especially in financial services and insurance, commercial tools with vendor support and proven track records are the standard choice. The cost of a migration failure far exceeds the cost of tooling.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Build a Knowledge Graph from Enterprise Source Code</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 17:46:17 +0000</pubDate>
      <link>https://dev.to/corestory/how-to-build-a-knowledge-graph-from-enterprise-source-code-507c</link>
      <guid>https://dev.to/corestory/how-to-build-a-knowledge-graph-from-enterprise-source-code-507c</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; A code knowledge graph transforms a codebase from a collection of text files into a structured, queryable model of how the system actually works. The architecture involves five phases: AST parsing, relationship extraction, graph storage, incremental updates, and agent delivery via MCP. Open-source tools like GitNexus, Potpie AI, and CodeGraph have proven the approach works for individual developers. CoreStory's Code Intelligence Model applies the same architecture at enterprise scale — multiple languages, millions of lines of code, and validated business rule extraction.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why a Knowledge Graph Is the Right Model for Code
&lt;/h2&gt;

&lt;p&gt;Source code is inherently relational. A function calls other functions. A class inherits from a parent. A service depends on other services. A business rule spans multiple files across several modules. These relationships are the architecture, and they're invisible to tools that treat code as text.&lt;/p&gt;

&lt;p&gt;Vector-based approaches (embeddings and RAG) treat code like any other text: split it into chunks, embed it, and retrieve by semantic similarity. That works for finding code that looks similar to a query. But it fails at structural questions: "What calls this function?", "What happens when a payment fails?", "Which services are affected if I change this schema?". These are graph traversal problems, not similarity search problems.&lt;/p&gt;

&lt;p&gt;A knowledge graph represents code entities (files, functions, classes, modules, services) as nodes and their relationships (calls, imports, inherits, defines, depends-on) as edges. This structure enables queries that follow execution paths, trace dependencies, and map the impact of changes — the exact operations that developers and AI agents need to work safely on large systems.&lt;/p&gt;

&lt;p&gt;The distinction matters more than it seems. When an AI agent retrieves code via RAG, it gets a handful of text fragments that seem relevant. When it queries a knowledge graph, it gets the actual call chain, the real dependencies, and the complete context of how a piece of code fits into the system.&lt;/p&gt;

&lt;p&gt;So how does all this work in practice?&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: AST Parsing at Scale
&lt;/h2&gt;

&lt;p&gt;The foundation of any code knowledge graph is Abstract Syntax Tree (AST) parsing. An AST is the compiler's representation of your source code: a tree structure that captures every function, class, variable, import, and expression in a machine-readable format.&lt;/p&gt;

&lt;p&gt;Tree-sitter has become the dominant parser for code intelligence tools. It's the same parser GitHub uses for syntax highlighting, and it supports incremental parsing — meaning it can re-parse only the changed portions of a file instead of reprocessing the entire codebase. GitNexus, KiroGraph, Graphify, and Code Grapher all use Tree-sitter as their parsing layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What AST parsing extracts&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Functions and methods: names, signatures, parameters, return types&lt;/p&gt;

&lt;p&gt;Classes and interfaces: inheritance hierarchies, implemented interfaces, decorators&lt;/p&gt;

&lt;p&gt;Import statements: cross-file dependencies, external library usage&lt;/p&gt;

&lt;p&gt;Variable declarations: types, scopes, usage patterns&lt;/p&gt;

&lt;p&gt;Export statements: public API surface of each module&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The polyglot challenge&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Enterprise codebases rarely use a single language. A typical system might combine Java backend services, TypeScript frontend applications, Python data pipelines, SQL stored procedures, and even COBOL mainframe modules. Each language has its own AST structure, its own relationship patterns, and its own idioms.&lt;/p&gt;

&lt;p&gt;Most open-source code graph tools support between 4 and 14 languages. GitNexus supports deep semantic analysis for 8 languages (TypeScript, JavaScript, Python, Java, Go, Rust, PHP, Ruby). KiroGraph handles 24 node types across modern web languages. CoreStory supports all of the above and then some, including legacy languages like COBOL and RPG that most tools can't parse at all — this isn't a minor detail since if your knowledge graph can't parse the COBOL module that implements 60% of your business logic, the graph is missing the most important part of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Relationship Extraction
&lt;/h2&gt;

&lt;p&gt;AST parsing gives you the nodes. Relationship extraction gives you the edges, and the edges are where the intelligence lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Core relationship types&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Relationship&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Why it matters&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CALLS&lt;/td&gt;
&lt;td&gt;processPayment() → validateCard()&lt;/td&gt;
&lt;td&gt;Execution flow; change impact analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IMPORTS&lt;/td&gt;
&lt;td&gt;service.ts imports auth.ts&lt;/td&gt;
&lt;td&gt;Dependency tracking; breaking change detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INHERITS&lt;/td&gt;
&lt;td&gt;PremiumUser extends BaseUser&lt;/td&gt;
&lt;td&gt;Type hierarchies; polymorphism understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IMPLEMENTS&lt;/td&gt;
&lt;td&gt;PaymentService implements IPaymentProcessor&lt;/td&gt;
&lt;td&gt;Interface contracts; substitutability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DEFINES&lt;/td&gt;
&lt;td&gt;module defines calculateTax()&lt;/td&gt;
&lt;td&gt;Ownership; responsibility mapping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DATA_FLOW&lt;/td&gt;
&lt;td&gt;userInput → sanitize() → database&lt;/td&gt;
&lt;td&gt;Security analysis; data lineage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The hard part is cross-file resolution. When a TypeScript file imports a function from another module, the parser needs to resolve that import to the actual definition, which might be re-exported through an index file, aliased under a different name, or defined in a completely different repository. Tools like GitNexus handle named bindings, re-export tracking, and constructor-inferred type resolution. At enterprise scale, this resolution becomes significantly more complex when services communicate via APIs, message queues, or shared databases rather than direct imports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Graph Storage and Query
&lt;/h2&gt;

&lt;p&gt;Once you've extracted nodes and edges, you need a storage layer that supports efficient graph traversal. The dominant choice in the open-source ecosystem is Neo4j. Potpie AI, CodeGraph, and Code Grapher all use it as their graph database. GitNexus built its own lightweight format (LadybugDB), while KiroGraph uses SQLite for local-first operation.&lt;/p&gt;

&lt;p&gt;The storage choice affects what queries are practical. A graph database supports queries like:&lt;/p&gt;

&lt;p&gt;"Show me all callers of validatePayment() within 3 hops" (breadth-first traversal)&lt;/p&gt;

&lt;p&gt;"Trace the complete execution path from HTTP request to database write" (depth-first traversal)&lt;/p&gt;

&lt;p&gt;"What is the impact radius if I change the User schema?" (dependency fan-out)&lt;/p&gt;

&lt;p&gt;"Find all dead code, functions that are defined but never called" (orphan detection)&lt;/p&gt;

&lt;p&gt;These queries are natural operations on a graph database but extremely expensive or impossible with vector search. Try asking a RAG system "what is the impact radius of changing the User schema". It doesn't know, because impact radius is a graph property, not a text similarity property.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Incremental Updates
&lt;/h2&gt;

&lt;p&gt;A knowledge graph that requires full reprocessing on every commit is impractical for large codebases. The solution is git-diff-driven incremental updates: detect which files changed, re-parse only those files, update affected nodes and edges, and leave the rest of the graph intact.&lt;/p&gt;

&lt;p&gt;KiroGraph reports up to 90% reduction in token usage for common read patterns when using an incrementally maintained graph versus raw file reading. Code Grapher implements surgical updates via its update_graph_from_diff tool. Graphify uses file-content hashing to determine which files need re-extraction, running AST rebuilds instantly on code changes without LLM calls.&lt;/p&gt;

&lt;p&gt;At enterprise scale, incremental updates need to handle branch-based development, merge conflicts, and multi-repository changes. CoreStory's ingestion pipeline processes git diffs incrementally, updating the Code Intelligence Model without reprocessing the entire codebase. This is critical when you're dealing with repositories that contain millions of lines across dozens of services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Delivery — Making the Graph Useful to AI Agents
&lt;/h2&gt;

&lt;p&gt;A knowledge graph is only valuable if agents can query it. The delivery layer is where the architecture connects to actual development workflows.&lt;/p&gt;

&lt;p&gt;The industry has converged on MCP (Model Context Protocol) as the standard delivery mechanism. GitNexus, Code Grapher, KiroGraph, Graphify, and CoreStory all provide MCP servers that expose graph queries to AI coding agents. When an agent in Claude Code, Cursor, or Codex needs to understand part of the codebase, it queries the MCP server and receives structured results. These results are not raw code, but graph-derived intelligence about relationships, dependencies, and architecture.&lt;/p&gt;

&lt;p&gt;The key architectural decision is what level of intelligence to deliver. Open-source tools typically serve raw graph data: nodes, edges, and traversal results. The agent then interprets this data using its own reasoning. CoreStory goes further: the CIM delivers pre-analyzed specifications (component descriptions, architecture summaries, and extracted business rules) so the agent receives understanding, not just data.&lt;/p&gt;

&lt;p&gt;Knowledge Graph creation - from  Source Code to Code Context for humans and AI agents&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyxioijtlkapnv53gso8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyxioijtlkapnv53gso8.png" alt="Knowledge Graph creation - from  Source Code to Code Context for humans and AI agents" width="800" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise-Scale Code Intelligence
&lt;/h2&gt;

&lt;p&gt;CoreStory's &lt;a href="https://corestory.ai/platform" rel="noopener noreferrer"&gt;Code Intelligence Model&lt;/a&gt; (CIM) follows this five-phase architecture, purpose-built for enterprise scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Polyglot AST parsing across multiple languages, including COBOL, RPG, and other legacy languages that open-source parsers don't support.&lt;/li&gt;
&lt;li&gt;Relationship extraction that handles enterprise patterns: API calls between microservices, database queries, message queue consumers, and stored procedure invocations.&lt;/li&gt;
&lt;li&gt;Persistent graph storage with incremental updates driven by git diffs.&lt;/li&gt;
&lt;li&gt;AI-enhanced specification generation: the CIM doesn't just store the graph — it generates human-readable specifications from the structural analysis.&lt;/li&gt;
&lt;li&gt;MCP delivery that serves structured intelligence to any compatible AI coding agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The open-source tools described in this article prove the architecture works. &lt;strong&gt;CoreStory is the production-grade implementation&lt;/strong&gt; for teams that need polyglot support, enterprise scale, and validated output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of a comprehensive Knowledge Graph
&lt;/h2&gt;

&lt;p&gt;A well structured knowledge brings a series of advantages to the enterprise teams at all levels&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Improving Developer Experience and Lowering "Cognitive Load"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Knowledge Graph (KG) reduces "Onboarding Time" for new developers that are being brought to a project. They can ask, "Where does the data from this form eventually get stored?" and get a trace across three services, instead of having to navigate the code themselves, or ask other developers.&lt;/li&gt;
&lt;li&gt;By using a knowledge graph, AI coding agents (like Cursor or Claude) stop hallucinating imports or using deprecated APIs because the graph enforces the actual dependency tree. This improves the quality of code outputs, which in turn reduces the effort of code validation and debugging by developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Improving "System Observability" for Architects&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When considering the inter-service dependencies, a world-class knowledge graph includes Infrastructure-as-Code (IaC) — it doesn't just link COBOL to Java; it links the Java service to its Kubernetes config and its database schema.&lt;/li&gt;
&lt;li&gt;While this article focuses on AST (static), the future is merging this with OpenTelemetry (dynamic) data to show which graph edges are most "active" or error-prone, providing unique perspectives over the actual live architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Focusing on "Lower Risk with higher ROI" for CIOs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When done well, the knowledge graph becomes the "Institutional Memory" that doesn't quit. When the senior developers retire, the knowledge graph remains as the documented map of their combined logic and decisions over the years of development. This is a strategic de-risking.&lt;/li&gt;
&lt;li&gt;"Standard RAG" often leads to AI-generated code that breaks builds. Moving to a Code Intelligence Model (CIM) reduces "rework" costs by ensuring AI agents have 100% architectural context. This will have considerable impacts on the actual ROI of software development.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From Code to Intelligence
&lt;/h2&gt;

&lt;p&gt;Building a knowledge graph from source code is no longer a research project. The architecture is proven: AST parsing, relationship extraction, graph storage, incremental updates, and MCP delivery. Open-source tools let individual developers experiment today.&lt;/p&gt;

&lt;p&gt;For enterprise teams dealing with large, polyglot, legacy codebases, CoreStory's Code Intelligence Model is a production-ready implementation of this architecture, purpose-built for the scale and complexity that open-source tools aren't designed to handle.&lt;/p&gt;

&lt;p&gt;See how CoreStory builds a Code Intelligence Model from your codebase. &lt;a href="https://corestory.ai/talk-to-an-expert" rel="noopener noreferrer"&gt;Talk to an expert&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h3&gt;
  
  
  Frequently Asked Questions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How is a code knowledge graph different from a code search index?&lt;/strong&gt;&lt;br&gt;
A search index helps you find code. A knowledge graph helps you understand code. Search indexes map text to locations; knowledge graphs map entities to relationships. The difference shows up when you need to trace execution paths, analyze change impact, or understand how components interact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I build a code knowledge graph with open-source tools?&lt;/strong&gt;&lt;br&gt;
Yes. GitNexus, Potpie AI, CodeGraph, KiroGraph, Graphify, and Code Grapher all provide open-source or free implementations. They work well for single-language repositories under 500,000 lines. For enterprise-scale polyglot systems, you'll likely need a purpose-built platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How long does it take to index a codebase?&lt;/strong&gt;&lt;br&gt;
AST parsing is fast, and most tools report seconds to minutes for repositories under 100,000 lines. Incremental updates after the initial index are near-instantaneous. The bottleneck at enterprise scale is relationship resolution across services and languages, which is where CoreStory's purpose-built pipeline adds value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the knowledge graph replace documentation?&lt;/strong&gt;&lt;br&gt;
No, but it can generate documentation as a byproduct. The primary value is structural intelligence: call graphs, component maps, business rules, and dependency relationships that are derived directly from code analysis, not manually written. This intelligence is what AI agents need to work safely on large systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>legacy</category>
      <category>development</category>
      <category>productivity</category>
    </item>
    <item>
      <title>MCP Servers for Codebase Context: How AI Coding Agents Access Code Intelligence</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 17:30:39 +0000</pubDate>
      <link>https://dev.to/corestory/mcp-servers-for-codebase-context-how-ai-coding-agents-access-code-intelligence-3757</link>
      <guid>https://dev.to/corestory/mcp-servers-for-codebase-context-how-ai-coding-agents-access-code-intelligence-3757</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; Model Context Protocol (MCP) is the open standard for connecting AI agents to external tools and data sources. For software teams, the most valuable MCP server isn’t for Slack or Postgres — it’s for your codebase. But not all code MCP servers are created equal. The spectrum ranges from basic file search to semantic code retrieval to full code intelligence delivery. This article explains MCP’s architecture, compares existing code-focused MCP servers, and shows where CoreStory fits as the code intelligence layer that serves structured specifications (not raw code) to any compatible AI agent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  MCP in Two Minutes
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol is an open standard introduced by Anthropic in November 2024 and donated to the Linux Foundation’s Agentic AI Foundation in December 2025. It defines a universal way for AI applications (clients) to communicate with external data sources and tools (servers) over JSON-RPC 2.0.&lt;/p&gt;

&lt;p&gt;The architecture is straightforward. A host is the AI application you interact with (Claude Code, Cursor, VS Code with Copilot, Codex, Windsurf, Zed, or any custom tool). Inside the host, an MCP client manages the connection to one or more MCP servers. Each server exposes capabilities through three primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resources:&lt;/strong&gt; read-only data the AI can pull into context like files, database records, API responses, documentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools:&lt;/strong&gt; executable functions the AI can invoke like running queries, creating files, triggering deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts:&lt;/strong&gt; reusable instruction templates for common tasks, like code review workflows, commit message generation, test scaffolding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before MCP, every AI tool had its own integration approach. Connecting an AI assistant to GitHub, Jira, a database, and your codebase required four separate custom integrations per tool. With MCP, you write a server once and every compatible client can consume it. The protocol turned what was an N×M integration problem into an N+M one.&lt;/p&gt;

&lt;p&gt;By early 2026, MCP has been adopted by every major AI coding platform: Claude Code, Cursor, VS Code Copilot, Codex, Windsurf, Zed, Continue.dev, Cline, and Goose. OpenAI officially adopted the standard in March 2025. Google DeepMind followed. The protocol is now backed by SDKs in TypeScript, Python, C#, and Java.‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Matters More for Code Than for Data
&lt;/h2&gt;

&lt;p&gt;Most MCP coverage focuses on connecting agents to databases, CRMs, and communication tools. That’s useful, but it undersells the protocol’s most transformative application: codebase context delivery.&lt;/p&gt;

&lt;p&gt;The challenge is unique. When an AI agent queries a database via MCP, it gets structured data back (rows, columns, types). The query result is self-contained. When an agent queries a codebase, what it gets back shapes everything it does next: every line of code it writes, every refactoring suggestion it makes, every test it generates.&lt;/p&gt;

&lt;p&gt;A bad database query wastes a few seconds. Bad codebase context produces hallucinated imports, broken call chains, patterns that contradict your architecture, and “fixes” that break other parts of the system. The stakes are fundamentally different.&lt;/p&gt;

&lt;p&gt;This is why the type of codebase MCP server matters enormously. Retrieving files and retrieving intelligence are not the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Spectrum of Codebase MCP Servers
&lt;/h2&gt;

&lt;p&gt;Not all code MCP servers deliver the same depth of context. The ecosystem has stratified into three distinct categories, each solving a different level of the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Category 1: File Search Servers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The simplest category exposes basic file operations: read files, search text, list directories, grep for patterns. The Anthropic reference filesystem MCP server falls into this category, as do the built-in file tools in Claude Code and Cursor.&lt;/p&gt;

&lt;p&gt;These servers mirror what a developer does when exploring an unfamiliar codebase: open the folder, look at the structure, search for a function name. They’re fast, deterministic, and require zero setup. But they don’t understand code — they treat source files as text documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Category 2: Semantic Code Search Servers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The second category adds intelligence to retrieval. These servers index your codebase using embeddings or AST analysis and support semantic queries: “find code related to authentication” or “show me the payment processing flow.”&lt;/p&gt;

&lt;p&gt;Code Pathfinder is a strong example. It builds a comprehensive call graph through multi-pass AST analysis and exposes it via MCP, enabling agents to query callers, trace dependencies, and perform dataflow analysis. GitNexus uses Tree-sitter to build a knowledge graph with Graph RAG, serving structural context to Claude Code and Cursor. KiroGraph provides a 100% local semantic graph with hybrid search. Nella MCP offers AST-aware chunking with assumption validation and dependency tracking.&lt;/p&gt;

&lt;p&gt;These tools represent a significant upgrade over file search. The agent receives structurally aware results (actual function calls, real dependency chains, verified import relationships) rather than text fragments that happen to contain matching keywords.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Category 3: Code Intelligence Servers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The third category goes beyond retrieval entirely. Instead of returning raw code or search results, code intelligence MCP servers deliver pre-analyzed specifications: component descriptions, architecture summaries, business rule documentation, and relationship maps.&lt;/p&gt;

&lt;p&gt;The distinction is critical. Categories 1 and 2 give the agent data and expect the agent to reason about it. Category 3 gives the agent understanding — pre-computed intelligence that the agent can use directly without additional analysis.&lt;/p&gt;

&lt;p&gt;CoreStory operates in this category. Its MCP server delivers structured specifications from the Code Intelligence Model: what each component does, how services connect, what business rules are embedded in the code, and how changes propagate through the system. The agent receives architecture-level understanding, not raw code.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Comparison: What Each Category Delivers&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Capability&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;File Search&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Semantic Search&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Code Intelligence&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Find a specific file&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search by keyword&lt;/td&gt;
&lt;td&gt;Yes (grep)&lt;/td&gt;
&lt;td&gt;Yes (semantic)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace call chains&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Code Pathfinder, GitNexus, CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Map dependencies&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;KiroGraph, Nella, CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extract business rules&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deliver structured specs&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support legacy languages&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Varies (4-14 langs)&lt;/td&gt;
&lt;td&gt;Yes (40+ languages)&lt;/td&gt;
&lt;td&gt;CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise multi-repo&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Qodo, CoreStory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  How a Code Intelligence MCP Server Works
&lt;/h2&gt;

&lt;p&gt;To understand why code intelligence MCP servers deliver different results, it helps to see what happens under the hood.&lt;/p&gt;

&lt;p&gt;When an agent using a file search MCP server asks “how does authentication work?”, the server runs grep or a file listing and returns files containing the word “auth.” The agent gets raw text and must figure out the architecture itself.&lt;/p&gt;

&lt;p&gt;When an agent using a semantic search MCP server asks the same question, the server performs an embedding-based search or graph traversal and returns the most relevant code chunks or graph nodes. The agent gets better-targeted results but still needs to synthesize understanding from code fragments.&lt;/p&gt;

&lt;p&gt;When an agent using CoreStory’s code intelligence MCP server asks the same question, it receives structured output: the authentication service’s specification, its dependencies on the session manager and token validator, the business rules governing token expiry and refresh logic, and the cross-service call chain from HTTP request through middleware to database. The agent receives the answer, not the raw material to construct an answer.‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up a Code Intelligence MCP Server
&lt;/h2&gt;

&lt;p&gt;MCP server setup follows a common pattern regardless of the server type. Every MCP-compatible host (Claude Code, Cursor, VS Code, Codex) uses a configuration file that specifies which servers to connect to.&lt;/p&gt;

&lt;p&gt;For local MCP servers, the configuration typically lives in a project-level &lt;code&gt;.mcp.json&lt;/code&gt; file or a global settings file. Each entry specifies the server command, arguments, and any required environment variables. Remote MCP servers use HTTP-based transport instead of the local stdio approach.&lt;/p&gt;

&lt;p&gt;Open-source code MCP servers like Code Pathfinder, GitNexus, and KiroGraph run locally and index your codebase on your machine. This keeps your code local, but the trade-off is that local servers are limited to the languages and scale their parsers support.&lt;/p&gt;

&lt;p&gt;CoreStory’s MCP server connects to the Code Intelligence Model, which runs on CoreStory’s infrastructure. The agent queries the MCP server, which returns structured specifications from the pre-built intelligence model. Setup requires ingesting your codebase into CoreStory first, after which the MCP server provides immediate access to the full Code Intelligence Model.‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the MCP Standard Changes the Game for Code Intelligence
&lt;/h2&gt;

&lt;p&gt;Before MCP, code intelligence tools were locked into specific delivery channels. You had to use a particular IDE extension or a specific web interface. If your team used Cursor but the intelligence tool only had a VS Code plugin, you were out of luck.&lt;/p&gt;

&lt;p&gt;MCP eliminates that constraint. Any code intelligence system that implements an MCP server is instantly accessible from any MCP-compatible host. This is why MCP coverage in the code intelligence space has exploded: GitNexus, Code Pathfinder, KiroGraph, Graphify, Code Grapher, Nella, Qodo, Sourcegraph, and CoreStory all provide MCP servers.&lt;/p&gt;

&lt;p&gt;For enterprise teams, this means choosing a code intelligence platform is no longer a lock-in decision about which IDE or agent to use. The intelligence layer is decoupled from the consumption layer. Your architects can query CoreStory through Claude Code while your developers use Cursor — same intelligence model, different interfaces, unified by MCP.&lt;/p&gt;

&lt;p&gt;The protocol’s 2026 roadmap focuses on production readiness: authentication, gateway patterns, audit logging, and streaming responses. As these enterprise features land, MCP-based code intelligence becomes viable for regulated industries where security and compliance are non-negotiable.&lt;/p&gt;

&lt;h2&gt;
  
  
  CoreStory’s MCP Implementation
&lt;/h2&gt;

&lt;p&gt;CoreStory’s MCP server is the delivery layer for the Code Intelligence Model. When an agent connects to it, the server exposes tools that let the agent query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Component specifications:&lt;/strong&gt; what each module, service, or class does, its responsibilities, and its public interface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture relationships:&lt;/strong&gt; how services connect, what the call chains look like, where data flows between components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business rules:&lt;/strong&gt; the logic embedded in code, extracted and structured — including from legacy languages like COBOL that most tools can’t parse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Change impact:&lt;/strong&gt; what specifications are affected when a particular file or function changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search across the entire intelligence model:&lt;/strong&gt; find components by function, by technology, or by business domain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical difference is that CoreStory’s MCP server delivers intelligence derived from analysis, not raw code retrieved by search. The MCP server makes that entire intelligence model queryable by any compatible AI agent.&lt;/p&gt;

&lt;p&gt;Because CoreStory supports a wide range of programming languages, the MCP server provides a unified intelligence layer even for polyglot enterprise systems. An agent can query the architecture of a system that spans Java microservices, a Python data pipeline, and a COBOL mainframe … all through a single MCP connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa77trm5is15pj2qawdx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa77trm5is15pj2qawdx3.png" alt="File search MCP vs Code Intelligence MCP" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Give Your Agents Real Code Intelligence
&lt;/h2&gt;

&lt;p&gt;MCP solved the integration problem. The question now is what intelligence you push through the protocol.&lt;/p&gt;

&lt;p&gt;File search MCP servers help agents find code. Semantic search servers help agents find relevant code. CoreStory’s code intelligence MCP server helps agents understand your entire system (architecture, business rules, dependencies, and change impact) through a single, standardized connection.&lt;/p&gt;

&lt;p&gt;Connect your codebase to any AI agent with CoreStory’s MCP server. &lt;a href="https://accounts.corestory.ai/sign-up" rel="noopener noreferrer"&gt;Try it free today&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h3&gt;
  
  
  Frequently Asked Questions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What AI tools support MCP?&lt;/strong&gt;&lt;br&gt;
As of 2026: Claude Code, Cursor, VS Code with Copilot, OpenAI Codex, Windsurf, Zed, Continue.dev, Cline, and Goose. The Linux Foundation’s Agentic AI Foundation governance ensures the standard remains vendor-neutral.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need to self-host an MCP server?&lt;/strong&gt;&lt;br&gt;
It depends on the server. Open-source tools like Code Pathfinder and GitNexus run locally. CoreStory offers both cloud and on-premises deployment for enterprise teams with data sovereignty requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use multiple MCP servers at once?&lt;/strong&gt;&lt;br&gt;
Yes. MCP hosts support multiple simultaneous server connections. A typical setup might combine a GitHub MCP server, a Jira MCP server, and a code intelligence MCP server like CoreStory — each providing different context to the same agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is CoreStory’s MCP server different from Sourcegraph’s?&lt;/strong&gt;&lt;br&gt;
Sourcegraph’s MCP integration exposes code search and navigation. CoreStory’s MCP server delivers analyzed specifications: architecture maps, business rules, and component descriptions. Sourcegraph finds code; CoreStory explains what the code means. They serve different purposes and can work together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the performance impact?&lt;/strong&gt;&lt;br&gt;
MCP queries add minimal latency, typically milliseconds for local servers, low hundreds of milliseconds for remote servers. KiroGraph reports up to 90% reduction in overall token usage compared to agents that explore codebases via file reads, because graph-based retrieval is far more targeted than sequential file scanning.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>development</category>
      <category>agents</category>
    </item>
    <item>
      <title>Best Tools for Understanding Large Legacy Codebases in 2026</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 17:16:27 +0000</pubDate>
      <link>https://dev.to/corestory/best-tools-for-understanding-large-legacy-codebases-in-2026-1h67</link>
      <guid>https://dev.to/corestory/best-tools-for-understanding-large-legacy-codebases-in-2026-1h67</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; Understanding a large legacy codebase requires more than search and navigation. The tools available in 2026 fall into four categories: navigation tools (Sourcegraph, OpenGrok) that help you find code, AI-assisted explorers (GitHub Copilot Chat, Cursor, Cody) that explain code in context, visualization tools (CodeScene, Understand by SciTools) that show structure, and code intelligence platforms (CoreStory) that build a persistent, queryable model of what the system does and why. This guide covers each category honestly, including what works for COBOL and mainframe systems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Legacy Codebase Understanding Is Hard
&lt;/h2&gt;

&lt;p&gt;You’ve just been assigned to a system that was built 15 years ago. The original developers are gone. The documentation, if it exists, describes the system as it was in 2015. The codebase spans three languages, two databases, and a mainframe component nobody wants to touch.&lt;/p&gt;

&lt;p&gt;This is the reality for most enterprise engineering teams. The systems that run the business (like the ones that process payments, manage claims, handle supply chains) are the ones that are hardest to understand. They accumulated complexity over decades of maintenance by dozens of developers, each making locally reasonable decisions that added up to a globally opaque system.&lt;/p&gt;

&lt;p&gt;The challenge has three dimensions:&lt;/p&gt;

&lt;p&gt;First, tribal knowledge loss: the people who understood the system have moved on, taking critical context with them.&lt;/p&gt;

&lt;p&gt;Second, documentation decay: whatever documentation existed has drifted so far from reality that it’s more dangerous than having no documentation at all.&lt;/p&gt;

&lt;p&gt;Third, structural complexity: the codebase is too large and interconnected for any single person to hold in working memory.&lt;/p&gt;

&lt;p&gt;No single tool solves all three problems. But the right combination of tools can take a team from “we have no idea how this works” to “we can safely modify this system” in weeks rather than months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 1: Navigation Tools for Finding Code
&lt;/h2&gt;

&lt;p&gt;The most basic layer of codebase understanding is being able to find what you’re looking for. Navigation tools index your code and provide fast, accurate search across repositories.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Sourcegraph Code Search&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Sourcegraph is the category leader for code search at scale. It indexes codebases across multiple repositories, code hosts, and languages, providing near-instant search results with regex support, structural search (matching AST patterns), and cross-repository results. For enterprise teams with hundreds of repositories across GitHub, GitLab, and Bitbucket, Sourcegraph provides a single search interface that no IDE can match.&lt;/p&gt;

&lt;p&gt;Sourcegraph’s code intelligence layer (SCIP) adds go-to-definition and find-references across repository boundaries — critical for understanding microservice architectures where a function call in one repository invokes code in another.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;OpenGrok&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenGrok is an open-source code search and cross-reference engine. It’s fast, handles large codebases well, and has been a reliable workhorse for organizations that need self-hosted search. It lacks the AI features and cross-repository intelligence of Sourcegraph, but for teams that need straightforward code search on their own infrastructure, OpenGrok remains a solid choice.&lt;/p&gt;

&lt;p&gt;What navigation tools solve: “Where is this function defined?”; “Which files reference this variable?”; “Show me all implementations of this interface across our repositories.”&lt;/p&gt;

&lt;p&gt;What they don’t solve: “Why does this code exist?”; “What business rule does this function implement?”; “What breaks if I change this?”&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 2: AI-Assisted Exploration for Explaining Code in Context
&lt;/h2&gt;

&lt;p&gt;The second category uses large language models to explain code to developers in natural language. These tools read the code you’re looking at and generate explanations, summaries, and suggestions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;GitHub Copilot Chat&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Copilot Chat lets developers ask questions about code directly in VS Code or JetBrains. It can explain functions, suggest fixes, generate tests, and answer questions about the current file or workspace. For individual files and functions, the quality of explanations is often impressive.&lt;/p&gt;

&lt;p&gt;The limitation is context. Copilot Chat understands what it can see in the current context window, typically the active file and a few related files. It doesn’t have access to the complete architecture, the full dependency graph, or the business context behind the code. For large systems, this means Copilot can explain what a single function does but struggles with questions like “how does this function fit into the broader payment processing workflow?”&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Sourcegraph Cody&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Cody addresses Copilot’s context limitation by combining LLM-based chat with Sourcegraph’s code search infrastructure. When you ask Cody a question, it retrieves relevant code from across your repositories using semantic search and RAG before generating an answer. This gives Cody access to broader context than IDE-based assistants.&lt;/p&gt;

&lt;p&gt;Cody supports context windows up to 1 million tokens and can pull from up to 10 remote repositories on the Enterprise plan. For teams already using Sourcegraph, Cody is a natural evolution that adds AI-assisted explanation on top of code search.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cursor and Windsurf&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Cursor and Windsurf are AI-native code editors that index your local codebase and maintain awareness of your editing session. They excel at explaining code in the context of what you’re currently working on — but like all session-based tools, they start fresh each time and lose context between sessions.&lt;/p&gt;

&lt;p&gt;What AI explorers solve: “Explain this function.”; “What does this code do?”; “Suggest a fix for this bug.”&lt;/p&gt;

&lt;p&gt;What they don’t solve: “Explain the complete architecture of this system.”; “Extract all business rules.”; “What is the impact of changing this database schema across all services?” AI explorers are limited by their context windows and lack persistent understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 3: Visualization Tools for Seeing Structure
&lt;/h2&gt;

&lt;p&gt;The third category creates visual representations of your codebase: dependency graphs, architecture maps, hotspot visualizations, and call trees.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;CodeScene&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CodeScene combines static code analysis with behavioral analysis from Git history. Its signature feature is hotspot detection: identifying code that is both complex and frequently changed — the areas where technical debt has the most business impact. CodeScene supports around 28 programming languages and integrates with GitHub, GitLab, Bitbucket, and Azure DevOps.&lt;/p&gt;

&lt;p&gt;What makes CodeScene distinctive is its focus on the human dimension of codebases. It maps knowledge distribution across the team, identifies “code red” areas where a single developer owns critical code, and quantifies organizational risk alongside technical risk. For engineering leaders who need to explain technical debt to business stakeholders, CodeScene’s visualizations are exceptionally clear.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Understand by SciTools&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Understand is a static analysis tool that provides dependency graphs, call trees, control flow graphs, data flow analysis, and metrics across large codebases. Used by NASA for safety-critical systems and certified for ISO 26262, IEC 61508, and EN 50128 compliance, Understand is built for environments where code quality is a safety concern.&lt;/p&gt;

&lt;p&gt;Understand supports a wide range of languages including C, C++, C#, Java, Python, Ada, Fortran, and COBOL. Its VS Code extension makes core features available without leaving the IDE. For teams in regulated industries (aerospace, automotive, medical devices), Understand’s compliance certification is a significant differentiator.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;CAST Imaging&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CAST Imaging creates interactive architecture maps that visualize software systems at multiple levels: application, technology, and transaction. It excels at cross-technology analysis like showing how COBOL programs, Java middleware, and SQL databases connect into a unified system view.&lt;/p&gt;

&lt;p&gt;What visualization tools solve: “Show me the architecture.”; “Which components are the most complex?”; “Where is the technical debt concentrated?”; “What are the dependencies between these modules?”&lt;/p&gt;

&lt;p&gt;What they don’t solve: “What business rules are embedded in this code?”; “Generate specifications I can use for modernization planning.”; “Give AI agents a persistent understanding of this system.” Visualization shows structure; it doesn’t extract meaning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 4: Code Intelligence Platforms for Understanding Meaning
&lt;/h2&gt;

&lt;p&gt;The fourth category goes beyond search, explanation, and visualization to build a persistent, queryable model of what a codebase actually does.&lt;/p&gt;

&lt;p&gt;CoreStory ingests your entire codebase, analyzes it structurally, and produces a Code Intelligence Model (CIM) that captures architecture, component relationships, business rules, and data flows. Unlike the tools in categories 1–3, the CIM persists across sessions and tools. It doesn’t just help you find or visualize code. It tells you what the system does and why.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What sets code intelligence apart&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Reverse-engineers understanding directly from code, without requiring existing documentation or manual input.&lt;/p&gt;

&lt;p&gt;Extracts business rules as structured specifications, not just summaries or visualizations.&lt;/p&gt;

&lt;p&gt;Supports a wide range of languages, including COBOL, RPG, and other legacy languages that categories 1–3 handle partially or not at all.&lt;/p&gt;

&lt;p&gt;Delivers intelligence to AI agents via MCP, making the entire model queryable by Claude Code, Cursor, Codex, and other tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison: Which Category Fits Your Situation
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Capability&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Sourcegraph&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Copilot/Cody&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CodeScene&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Understand&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CoreStory&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cross-repo search&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Cody)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code explanation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (LLM-powered)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (structured specs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture visualization&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Hotspots, coupling&lt;/td&gt;
&lt;td&gt;Call/dependency/flow graphs&lt;/td&gt;
&lt;td&gt;Architecture maps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business rule extraction&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (validated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;COBOL support&lt;/td&gt;
&lt;td&gt;Search only&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes (28+ languages)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (all languages)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change impact analysis&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (behavioral)&lt;/td&gt;
&lt;td&gt;Yes (dependency)&lt;/td&gt;
&lt;td&gt;Yes (specification-level)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI agent delivery (MCP)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Cody)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistent intelligence&lt;/td&gt;
&lt;td&gt;Index (not intelligence)&lt;/td&gt;
&lt;td&gt;No (session-based)&lt;/td&gt;
&lt;td&gt;Yes (metrics/trends)&lt;/td&gt;
&lt;td&gt;Yes (analysis DB)&lt;/td&gt;
&lt;td&gt;Yes (Code Intelligence Model)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Special Case: COBOL and Mainframe Codebases
&lt;/h2&gt;

&lt;p&gt;Most of the tools listed above were designed for modern languages. COBOL and mainframe systems present unique challenges: copybook resolution, PERFORM chain tracing, JCL job dependencies, DB2 and VSAM data store integration, and business logic embedded in data type definitions.&lt;/p&gt;

&lt;p&gt;For COBOL-specific analysis, the practical options are:&lt;/p&gt;

&lt;p&gt;IBM ADDI: The most established tool for mainframe dependency mapping and impact analysis. Tightly integrated with the z/OS ecosystem.&lt;/p&gt;

&lt;p&gt;CAST Imaging: Strong cross-technology visualization that includes COBOL alongside Java and SQL components.&lt;/p&gt;

&lt;p&gt;Understand by SciTools: Supports COBOL with call trees, dependency analysis, and compliance checking. Used in safety-critical environments.&lt;/p&gt;

&lt;p&gt;CoreStory: Full Code Intelligence Model for COBOL, including business rule extraction and structured specification generation.&lt;/p&gt;

&lt;p&gt;What doesn’t work for COBOL: generic AI coding assistants (Copilot, Cursor) that may hallucinate when asked to explain COBOL logic they weren’t extensively trained on, and code search tools that can find COBOL text but can’t parse its structural patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Finding Code to Understanding Systems
&lt;/h2&gt;

&lt;p&gt;Navigation tools help you find code. AI assistants help you explain code. Visualization tools help you see code structure. But for teams that need to truly understand a large legacy system (its architecture, its business rules, its hidden dependencies, and how changes propagate through it) you need code intelligence.&lt;/p&gt;

&lt;p&gt;CoreStory builds a persistent Code Intelligence Model from your codebase, no matter the language, size, or age. The understanding that results doesn’t disappear when the session ends or when team members leave. It compounds over time.&lt;/p&gt;

&lt;p&gt;See how CoreStory builds a Code Intelligence Model from your legacy codebase. &lt;a href="https://corestory.ai/talk-to-an-expert" rel="noopener noreferrer"&gt;Talk to an expert →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h3&gt;
  
  
  Frequently Asked Questions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Can I use multiple tools from different categories?&lt;/strong&gt;&lt;br&gt;
Yes, and many enterprise teams do. A possible combination could be Sourcegraph for code search, CodeScene for technical debt visibility, and CoreStory for deep code intelligence and AI agent context delivery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which tool should I start with?&lt;/strong&gt;&lt;br&gt;
Start with what your immediate need is. If developers can’t find code: Sourcegraph. If you need to visualize technical debt: CodeScene. If you need to understand a legacy system for modernization: CoreStory. Each tool delivers value independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are AI coding assistants reliable for understanding legacy code?&lt;/strong&gt;&lt;br&gt;
For explaining individual functions, they’re useful. For understanding system-level architecture and business logic, they’re limited by context windows and training data. For COBOL specifically, hallucination risk is significant. Always validate AI-generated explanations against the actual code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about open-source alternatives?&lt;/strong&gt;&lt;br&gt;
OpenGrok (code search), GitNexus (knowledge graphs), KiroGraph (semantic graphs), and Code Pathfinder (call graph analysis) all provide free, open-source capabilities. They work well for single-language repositories under 500,000 lines. Enterprise-scale polyglot systems typically require commercial tools.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>softwareengineering</category>
      <category>tooling</category>
    </item>
    <item>
      <title>The AI-Native Code Intelligence Stack: Where the Wiki Ends and the Graph Begins</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 17:09:11 +0000</pubDate>
      <link>https://dev.to/corestory/the-ai-native-code-intelligence-stack-where-the-wiki-ends-and-the-graph-begins-2jok</link>
      <guid>https://dev.to/corestory/the-ai-native-code-intelligence-stack-where-the-wiki-ends-and-the-graph-begins-2jok</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; If you are a developer just starting to take "codebase context" seriously, you are stepping into a stack that did not exist three years ago. It has four layers: the agent harness (Claude Code, Cursor, Aider, Copilot), retrieval (vector search, agentic grep), curated knowledge (Karpathy's LLM wiki, DeepWiki, Greptile), and a structured code graph (CoreStory, Sourcegraph). Each layer answers a different question. The wiki and vector layers work well for small repositories and descriptive questions. They break down on large, multi-language codebases, and on questions that need a graph traversal instead of a paragraph retrieval. This post maps the stack, shows where each piece earns its keep, and shows the use cases where wiki intelligence loses to a graph model of the code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem: Context Windows Are Huge, And It's Still Not Enough
&lt;/h2&gt;

&lt;p&gt;Ask a coding agent a question about a repository larger than its context window, and the answer depends entirely on what it happens to retrieve. Even inside the window, the situation is worse than LLM providers advertise.&lt;/p&gt;

&lt;p&gt;The needle-in-a-haystack benchmark has become the default way to measure long-context reliability. Place a single out-of-place fact inside a long document, then test whether the model can answer a question about it at different positions and different context lengths. Public results are consistent. Models that advertise 128K tokens start to degrade well before they fill the window, and widely cited evaluations of GPT-4 show rising error rates on ultra-long documents and failure to retrieve needles placed near the start of a document as the context grows. Multi-needle variants, where several facts must be retrieved and combined, perform worse still.&lt;/p&gt;

&lt;p&gt;Enterprise codebases are not haystacks. They are warehouses full of haystacks. A real service might have a million lines of code, fifteen years of history, and a data model that crosses half a dozen languages. No context window reaches that, and "just retrieve the right pieces" is the core unsolved problem the whole AI-native stack is trying to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Emerging Code Intelligence Stack
&lt;/h2&gt;

&lt;p&gt;Four layers are settling in:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxg0v6y8obhm8tvil0zch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxg0v6y8obhm8tvil0zch.png" alt="A four-layer stack diagram showing the AI-native code intelligence stack: agent runtime, retrieval, curated knowledge, and code graph." width="800" height="971"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent runtime.&lt;/strong&gt; This is where the developer sits: Claude Code in the terminal, Cursor in the editor, Aider on the command line, Copilot inside the IDE. The runtime decides what questions to ask, what tools to call, and how to act on answers. It is rarely the source of grounding; it is the consumer of grounding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval.&lt;/strong&gt; Before a model reasons, something has to hand it the right files. This is vector search (embeddings, BM25, hybrid rerankers), plus the newer "agentic retrieval" style where the agent itself runs grep, find, and file reads. Every mainstream agent now has an opinion here. Claude Code, Cursor, and Devin have moved away from pure vector databases toward agentic search over the filesystem, for reasons we describe below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Curated knowledge.&lt;/strong&gt; This is where Karpathy's LLM wiki sits, along with DeepWiki, Greptile, and a growing family of similar tools. These layers pre-digest the codebase into human- and agent-readable artifacts (markdown pages, per-function summaries, auto-generated architecture docs) that are smaller, cleaner, and more navigable than raw source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code graph / digital twin.&lt;/strong&gt; This is the structured, program-analyzed model of the system: components, workflows, business rules, data entities, and the typed edges between them. CoreStory sits here. It is not a list of pages. It is a queryable representation of how the code actually behaves, derived from the source and maintained as the source changes.&lt;/p&gt;

&lt;p&gt;A grown-up workflow uses all four. A beginner workflow usually starts with the agent runtime and one retrieval strategy, then adds curated knowledge when the repo gets too big for the model to reason about directly. The graph layer shows up when curated knowledge starts lying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Curated Knowledge: Karpathy, DeepWiki, and Greptile
&lt;/h2&gt;

&lt;p&gt;Karpathy's formulation of the LLM wiki, shared publicly as a gist, is one of the cleanest statements of what curated knowledge should look like. Three folders:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;raw/&lt;/strong&gt; holds the source material. For a codebase, this is the repo itself. Immutable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;wiki/&lt;/strong&gt; is a folder of LLM-written markdown pages, one per module or concept, plus an index.md and a log.md.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md&lt;/strong&gt; (or AGENTS.md) is the schema. It tells the agent how to ingest new material, name pages, cross-link them, and handle conflicts.&lt;/p&gt;

&lt;p&gt;A minimal schema looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;
&lt;span class="gu"&gt;## Wiki layout&lt;/span&gt;
-&lt;span class="sb"&gt;`raw/`&lt;/span&gt; contains immutable source. Never edit.
-&lt;span class="sb"&gt;`wiki/`&lt;/span&gt; contains one page per top-level module.
-&lt;span class="sb"&gt;`wiki/index.md`&lt;/span&gt; lists every page with a one-line summary.
-&lt;span class="sb"&gt;`wiki/log.md`&lt;/span&gt; records every ingest with a timestamp.
&lt;span class="gu"&gt;## Ingest workflow&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Read any new files under&lt;span class="sb"&gt;`raw/`&lt;/span&gt;.
&lt;span class="p"&gt;2.&lt;/span&gt; For each changed module, update or create&lt;span class="sb"&gt;`wiki/&amp;lt;module&amp;gt;.md`&lt;/span&gt;.
&lt;span class="p"&gt;3.&lt;/span&gt; Cross-link related pages using relative markdown links.
&lt;span class="p"&gt;4.&lt;/span&gt; Append an entry to&lt;span class="sb"&gt;`log.md`&lt;/span&gt;.
&lt;span class="gu"&gt;## Query workflow&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Read&lt;span class="sb"&gt;`wiki/index.md`&lt;/span&gt; first.
&lt;span class="p"&gt;2.&lt;/span&gt; Follow links into specific module pages.
&lt;span class="p"&gt;3.&lt;/span&gt; Never answer from memory when a page exists.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point Claude Code, Cursor, Codex, or Copilot at the folder and the agent reasons over its own distilled notes instead of re-loading the whole repo into context every session. For a personal knowledge base or a mid-sized repository, that is often enough.&lt;/p&gt;

&lt;p&gt;DeepWiki, from Cognition (the team behind Devin), automates this pattern for public GitHub repositories. Replace github.com with deepwiki.com in any URL and Cognition serves an auto-generated wiki with architecture diagrams, module explanations, and a conversational agent grounded in the actual source. Cognition has indexed tens of thousands of top public repositories and exposes the same data through an MCP endpoint (mcp.deepwiki.com) with three tools: ask_question, read_wiki_structure, and read_wiki_contents. It is a zero-setup version of the Karpathy pattern, for open-source code.&lt;/p&gt;

&lt;p&gt;Greptile (often the "G" in the short list of AI-native dev tools developers trade around) goes further. Greptile constructs a graph of files, functions, and dependencies, then uses that graph to ground AI code review, PR summaries, and codebase Q&amp;amp;A. Greptile's own engineering blog is unusually candid about why this is hard: semantic search on raw code is noisy, embeddings work better if you first translate code into natural language, and chunking at the per-function level beats per-file chunking. Greptile is a useful example of the curated-knowledge layer reaching for graph structure.&lt;/p&gt;

&lt;p&gt;These tools share a strength and a limit. They make a large repository legible to an agent. They are still, at heart, collections of summaries. When the question is "which downstream workflows break if I change this signature?", summaries are not a graph traversal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Vector-Search Layer: Useful, Noisy, Increasingly Optional
&lt;/h2&gt;

&lt;p&gt;The retrieval layer used to be synonymous with vector search. Chunk the code, embed the chunks, compare the query embedding against the index, return the top k, stuff them into the prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Classic vector-search retrieval over a code index
&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;textfor&lt;/span&gt; &lt;span class="n"&gt;chunkin&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things happened on the way to 2026. First, practitioners learned the specific ways embeddings misbehave on code. They favor frequently accessed or well-documented modules and sideline edge cases. They are black-box: when a retrieval misses, it is hard to say why. They go stale, because codebases change daily and indexes have to be diffed, re-chunked, re-embedded, and re-permissioned. Chunk size matters enormously; per-file chunks are too noisy, and per-function chunks require real parsing to produce.&lt;/p&gt;

&lt;p&gt;Second, the frontier agents moved. Public write-ups from the Claude Code, Cursor, and Devin teams have converged on "agentic search": instead of a vector database, the agent itself runs grep, find, and file reads, using its own reasoning to narrow the search. For interactive coding in a repo that is already on disk, that is often faster, more transparent, and easier to debug than vector retrieval.&lt;/p&gt;

&lt;p&gt;Vector search has not disappeared. It still earns its keep for semantic discovery ("where do we talk about authentication?"), for first-pass shortlisting in very large repositories, and inside hybrid systems where BM25 plus embeddings plus a cross-encoder reranker beats any single method. It is just no longer the whole answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code Graph Layer: Where the Wiki Loses
&lt;/h2&gt;

&lt;p&gt;The layer under everything is a structural model of the code. CoreStory builds this by running program analysis (AST, dataflow, control-flow, business rule extraction) across 40+ languages, including the older ones (COBOL, PL/I, mainframe dialects) where LLMs alone are weakest. The output is not a folder of markdown. It is a knowledge graph: components, workflows, business rules, data entities, and typed edges between them. Humans query it through a web dashboard. Agents query it through an MCP interface.&lt;/p&gt;

&lt;p&gt;A typical agent call looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;json&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"corestory.impact_of_change"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"PaymentService.refund"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"change"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"signature"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"workflows,business_rules,data_entities"&lt;/span&gt;&lt;span class="w"&gt;  
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response is not a paragraph. It is a list of workflows that reach that function, the business rules governing them, and the data entities they touch. The agent plans its refactor against that, not against a markdown summary.&lt;/p&gt;

&lt;p&gt;Four use cases show where this matters more than any wiki.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change impact analysis.&lt;/strong&gt; "If I change the signature of PaymentService.refund, what else breaks?" A wiki page can describe the module. A graph query enumerates every workflow, test, and downstream service that reaches it, across languages, in milliseconds. Wikis gesture. Graphs answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business rule traceability.&lt;/strong&gt; "Where is the rule that caps provider reimbursements at 90 days, and what code enforces it?" Curated summaries captures whatever the LLM happened to notice when it summarized the claims module. A code intelligence model extracts business rules as first-class objects with back-pointers to the exact branches that implement them. An auditor can follow the trace. A summary cannot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-language call graphs.&lt;/strong&gt; "Does this Java controller ultimately write to the COBOL ledger?" Summary pages live per module and per language. A code graph is native across both, because it is built from program analysis, not prose. For modernization work, this is the difference between a guess and a plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Legacy understanding.&lt;/strong&gt; LLMs are uneven on COBOL, PL/I, and mainframe dialects. Summarisation quality drops sharply on languages the base model rarely sees. A graph built from program analysis does not care; a COBOL paragraph is another node. This is where the summary pattern struggles most and where a structural model earns its cost.&lt;/p&gt;

&lt;p&gt;On internal benchmarks, shifting agents from prose-grounded to graph-grounded context produced a 44% improvement in agent task resolution, and Microsoft/GitHub's co-research on context grounding has reported a 51% improvement in engineer acceptance of agent-drafted code. The specific number matters less than the direction. Structured context beats summarized context on hard enterprise questions, consistently.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a Developer Should Assemble the Stack
&lt;/h2&gt;

&lt;p&gt;Start with the agent runtime you like. Add retrieval to fit the repo size: grep-style agentic search for small projects, vector plus BM25 plus reranking for larger ones. Add a curated knowledge layer (Karpathy's pattern, DeepWiki for public repos, Greptile for graph-aware summaries) when the agent starts forgetting the same things twice. Reach for a code graph when the questions you are asking are about impact, traceability, or cross-system behavior rather than "what does this file do?".&lt;/p&gt;

&lt;p&gt;The stack is not a waterfall. You can plug a code graph into a vector-aware agent and feed both into a Karpathy-style wiki. The point is knowing which layer you are actually relying on, and noticing when your curated knowledge has quietly become a maintenance problem instead of a grounding source.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ship Grounded Agents Before Your Codebase Outgrows Them
&lt;/h2&gt;

&lt;p&gt;If you are setting up context layers for the first time, the Karpathy pattern and DeepWiki are good places to start. If you already feel the friction (drift, stale pages, agents answering questions the wiki cannot actually support, business-rule questions that want a graph), that is the signal the stack needs a structural model underneath. Talk to an expert about running CoreStory against your own repository, or &lt;a href="https://app.corestory.ai/" rel="noopener noreferrer"&gt;try it for yourself&lt;/a&gt; today.&lt;/p&gt;

&lt;p&gt;‍ ‍&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Is the Karpathy LLM wiki pattern still worth adopting?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes, for small-to-mid repositories and personal knowledge bases. It is the cheapest durable grounding layer you can build. The pattern is open, the schema lives in your repo, and any modern coding agent knows what to do with it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does DeepWiki differ from a wiki I build myself?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;DeepWiki is a hosted, zero-setup version maintained by Cognition, with an MCP endpoint and tens of thousands of public repositories already indexed. You do not own the schema, but you also do not maintain it. It is an excellent entry point for reading unfamiliar open-source projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Is Greptile part of the same pattern?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Greptile starts from the same problem but leans on a graph of files, functions, and dependencies rather than flat pages. It is a useful bridge between a summary-based wiki and a full code intelligence model.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why not just rely on vector search?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Because vector retrieval on code is noisy, stale, and opaque, and because the strongest coding agents have mostly moved to agentic search on the filesystem. Vectors still help for semantic discovery and inside hybrid retrieval, but they are no longer enough on their own.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;When does a wiki stop being enough?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When agents confidently answer questions the wiki cannot actually support. When curated knowledge becomes its own maintenance problem. When the questions are about change impact, cross-service behaviour, or business-rule traceability. That is the moment to add a code graph underneath.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does CoreStory replace any of these layers?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. CoreStory is the graph layer. It sits under whichever retrieval strategy and whichever agent runtime you already use, and exposes the same structural model to humans through a dashboard and to agents through an MCP endpoint.&lt;/p&gt;

&lt;p&gt;However, adopting CoreStory in advance of implementing these complementary layers will help ensure that agents draft and maintain those layers with richer codebase awareness. In other words, your curated knowledge will be more comprehensive, and your agent runtime will more successfully reconcile discrepancies between sources.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>development</category>
      <category>agents</category>
    </item>
    <item>
      <title>How CoreStory Cuts LLM Costs by 70% While Improving Output Quality</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 15:50:54 +0000</pubDate>
      <link>https://dev.to/corestory/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality-43ap</link>
      <guid>https://dev.to/corestory/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality-43ap</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; LLMs charge per token, and large codebases generate enormous token bills — especially when AI agents re-ingest the same context repeatedly. CoreStory transforms your codebase into a persistent Code Intelligence Model (CIM), giving AI agents structured, targeted context instead of raw code. In a real-world evaluation, Claude Code paired with CoreStory used 73% fewer input tokens, ran in half the time, and cost 67% less — while delivering better results. This post explains why that happens and how to replicate it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Token Bill Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;A 10-engineer team running Claude Code against a 500,000-token codebase can burn $15,000–$40,000 per month in context re-ingestion alone  before writing a single line of net-new logic. That's not a projection. That's what happens when AI agents are given raw code instead of structured intelligence.&lt;/p&gt;

&lt;p&gt;Here's the math. Each developer session re-sends the same modules, schemas, and helper functions the model saw yesterday. A single prompt involving a non-trivial subsystem easily runs 20,000–50,000 input tokens. Multiply by 10 engineers, 20 working days, and 3–5 sessions per day, and you're looking at a substantial monthly token bill just for context, before accounting for the model's output.&lt;/p&gt;

&lt;p&gt;Output tokens compound the problem. Most AI providers charge &lt;strong&gt;3–5x more for output tokens than input tokens&lt;/strong&gt;. When the model lacks proper context, it produces longer, more hedged responses  and requires more correction rounds. Each round re-ingests the context, generates more output, and adds to the bill. The real cost of poor context isn't just the tokens you send,  it's the tokens you generate trying to fix the results.&lt;/p&gt;

&lt;p&gt;In a real customer evaluation: &lt;strong&gt;Claude Code + CoreStory MCP used 73% fewer input tokens&lt;/strong&gt;, ran in half the time, and &lt;strong&gt;cost 67% less&lt;/strong&gt; with better output quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Real-world cost comparison for adding a complex feature to a large enterprise codebase&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Claude Code + CoreStory&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;% Reduction&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Processing Time&lt;/td&gt;
&lt;td&gt;~92 min&lt;/td&gt;
&lt;td&gt;~47 min&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input Tokens&lt;/td&gt;
&lt;td&gt;~1,320,000&lt;/td&gt;
&lt;td&gt;~357,500&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;73% less&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Tokens&lt;/td&gt;
&lt;td&gt;~87,000&lt;/td&gt;
&lt;td&gt;~43,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50% less&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost (USD)&lt;/td&gt;
&lt;td&gt;~$5.29&lt;/td&gt;
&lt;td&gt;~$1.74&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;67% less&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why LLMs Have a Context Problem With Large Codebases
&lt;/h2&gt;

&lt;p&gt;LLMs don't retain memory between sessions. Every interaction starts from zero. When a developer asks an AI agent to refactor a module, the model needs not just that file, it needs the schemas it depends on, the helper functions it calls, the data flow it participates in, and enough architectural context to avoid introducing regressions. That's tens of thousands of tokens per request, for context the model already processed yesterday.&lt;/p&gt;

&lt;p&gt;This creates a pattern of escalating repeated spending. Teams working on production systems often send 1.5–5 million tokens per month simply to keep the model oriented  before counting any of the actual work tokens. And this is the base model cost. Many AI coding agents (Devin, Factory, and others built on top of foundation models) charge a premium per token and burn more per session through agentic loops.&lt;/p&gt;

&lt;p&gt;It's important to note that coding agents like Claude Code do support persistent configuration files (like CLAUDE.md, skill files and custom instructions) that carry context across sessions and can be shared across a team. But there's a meaningful difference between agent configuration ("here's how to work on this codebase") and code intelligence ("here are the critical architectures, business rules, and interdependencies, pre-mapped and queryable"). The former tells the agent how to behave. The latter gives it something to actually know. Configuration files are also rarely centrally governed, they drift, they vary by developer, and they don't scale with codebase complexity.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Agentic Loops Are Especially Expensive
&lt;/h2&gt;

&lt;p&gt;A standard developer prompt re-ingests context once. An AI agent running a multi-step loop — plan, execute, reflect, error-correct, retry — re-ingests that context at every step. A 10-step agentic loop on raw code isn't 10x the token cost of a single prompt. It can be 30–50x, because each reflection and error-correction cycle starts with a full context re-ingestion.&lt;/p&gt;

&lt;p&gt;This is where the CoreStory ROI is most dramatic. Providing an agent with a structured Code Intelligence Model instead of raw files doesn't just reduce the initial context, it reduces every downstream step, every correction round, and every output generation in the loop.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Code Intelligence Model Actually Is (And Why RAG Doesn't Solve This)
&lt;/h2&gt;

&lt;p&gt;CoreStory ingests your entire codebase once and produces a Code Intelligence Model,  a hierarchical specification organized by domain, module, and behavior contract. CoreStory's pipeline performs static analysis, call graph extraction, data flow tracing, and business logic summarization to produce structured output that captures what the software does, not just what it says.&lt;/p&gt;

&lt;p&gt;This is meaningfully different from a flat embedding index or a retrieval-augmented generation (RAG) approach. RAG sounds appealing: chunk the codebase, embed it, retrieve relevant chunks at query time. In practice, it fails for code in four specific ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Poor chunking boundaries: code modules don't chunk cleanly at semantic boundaries. A stored procedure and the schema it depends on rarely land in the same chunk&lt;/li&gt;
&lt;li&gt;Loss of cross-module dependencies: chunked embeddings lose the call graph, which is exactly what the model needs to avoid introducing integration errors&lt;/li&gt;
&lt;li&gt;No business logic layer: RAG retrieves code text; it doesn't extract the invariants, edge cases, and behavior contracts the CIM explicitly captures&lt;/li&gt;
&lt;li&gt;No invariant preservation: the CIM maintains consistent structural relationships; retrieval results vary by query phrasing, producing non-deterministic behavior in agentic loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result of using a CIM instead of raw code or RAG: the model receives a concise, high-signal specification rather than thousands of tokens of implementation detail , which is why token consumption drops by 70%+ in practice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fastk0x1hvjqn6iyx3yq6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fastk0x1hvjqn6iyx3yq6.png" alt="Developer Workflow with and without CoreStory" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quality Multiplier: Better Context Means Fewer Corrections
&lt;/h2&gt;

&lt;p&gt;According to the &lt;a href="https://survey.stackoverflow.co/2025/ai" rel="noopener noreferrer"&gt;2025 Stack Overflow Developer Survey&lt;/a&gt; (65,000+ respondents), 87% of developers are concerned about AI accuracy, and 45% say debugging AI-generated code is more time-consuming than debugging their own.&lt;/p&gt;

&lt;p&gt;That 45% statistic sounds abstract until you connect it to payroll. A developer at $150,000 fully-loaded annual cost spending 30% more time debugging AI output is losing approximately $45,000 per year in productivity (before you count the rework tokens the model burns trying to correct its own mistakes).&lt;/p&gt;

&lt;p&gt;Microsoft co-research with CoreStory found a &lt;strong&gt;51% accuracy improvement&lt;/strong&gt; when AI agents operate from CoreStory specifications rather than raw code. Across AI coding agent benchmarks, teams &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;using CoreStory to supercharge AI coding agents see &lt;strong&gt;44% better results&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The mechanism is straightforward: a model with a complete, consistent architectural view produces code that integrates correctly on the first attempt. It doesn't need to infer dependencies, they're specified. It doesn't need to guess at business rules, they're documented. Fewer hallucinations, fewer integration failures, fewer correction rounds. And fewer correction rounds means fewer output tokens, which compounds the cost savings.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Total Savings Across Team Sizes&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The figures below use Claude Sonnet 4.6 API pricing ($3/M input, $15/M output) as the enterprise baseline. Token estimates are based on observed developer usage patterns for teams using Claude Code as a primary development tool.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 2: Token savings by developers team size&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Team Size&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Monthly Baseline Token Cost (without CoreStory)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Monthly Token Cost with CoreStory(Conservative 50% token saving)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Monthly Token Cost with CoreStory(Ideal at 75% token saving)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Annual Saving (Conservative 50% token saving)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Annual Saving (Ideal 70% token saving)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Solo Developer&lt;/td&gt;
&lt;td&gt;~$600&lt;/td&gt;
&lt;td&gt;~$300&lt;/td&gt;
&lt;td&gt;~$180&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$3,600&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$5,040&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5-engineer team&lt;/td&gt;
&lt;td&gt;~$3,000&lt;/td&gt;
&lt;td&gt;~$1,500&lt;/td&gt;
&lt;td&gt;~$900&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$18,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$25,200&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10-engineer team&lt;/td&gt;
&lt;td&gt;~$15,000-$40,000&lt;/td&gt;
&lt;td&gt;~$7,500-$20,000&lt;/td&gt;
&lt;td&gt;~$4,500-$12,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$90K-$240K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$126K-$336K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50-engineer team&lt;/td&gt;
&lt;td&gt;~$75,000-"200,000&lt;/td&gt;
&lt;td&gt;~$37,500-$100,000&lt;/td&gt;
&lt;td&gt;~$22,500-~$60,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$450K-$1.2M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$630K-$1.68M&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;The 10-engineer range ($15K–$40K/month) reflects our own observed data on context re-ingestion costs for teams working on 500,000+ token codebases, before net-new output is factored in.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;Token savings are clear, but they’re still only one side of the equation.&lt;/p&gt;

&lt;p&gt;Let’s consider a fully-loaded senior developer at &lt;strong&gt;$200,000/year&lt;/strong&gt; — salary, benefits, overhead. That's roughly &lt;strong&gt;$100/hour&lt;/strong&gt;, or about &lt;strong&gt;$16,700/month&lt;/strong&gt;. Across a 10-engineer team, developer cost runs to &lt;strong&gt;~$2M/year&lt;/strong&gt; before infrastructure, tooling, or management costs.&lt;/p&gt;

&lt;p&gt;From the Stack Overflow Developer Survey, 45% of developers say debugging AI-generated code takes longer than debugging their own. Our evaluation data shows that with CoreStory, AI agents produce correct output on the first attempt more often. That's fewer correction rounds, fewer rework cycles, less time spent debugging hallucinated integrations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Task execution time — 50% reduction&lt;/strong&gt;Our real-world evaluation measured a 49% reduction in execution time for a complex feature task. Applied conservatively, a developer spending 6 hours/day on AI-assisted development tasks effectively recovers 3 hours — or gains the equivalent of one additional developer-day every two days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Rework reduction from better output quality&lt;/strong&gt;Fewer hallucinations, fewer integration failures, fewer correction rounds. If 30% of developer time currently goes to debugging and reworking AI-generated code (consistent with the Stack Overflow data), a 50% reduction in that rework reclaims 15% of total developer capacity.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 3: The Full Savings Combined&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Team Size&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Annual Token Saving (conservative at 50%)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;
&lt;strong&gt;Developer Time Value Recovered(50% task speed)&lt;/strong&gt;*&lt;/th&gt;
&lt;th&gt;
&lt;strong&gt;Rework Reduction Value&lt;/strong&gt;**&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Total Annual Savings&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5-engineer team&lt;/td&gt;
&lt;td&gt;~$18,000&lt;/td&gt;
&lt;td&gt;~$250,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$75,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$343,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10-engineer team&lt;/td&gt;
&lt;td&gt;~$90K-$240K&lt;/td&gt;
&lt;td&gt;~$500,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$150,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$740K-$890K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50-engineer team&lt;/td&gt;
&lt;td&gt;~$450K-$1.2M&lt;/td&gt;
&lt;td&gt;~$2.5M&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$750K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$3.7M-$4.45M&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;*Developer time value recovered assumes 50% of working hours are AI-assisted tasks, and a 50% speed improvement on those tasks — applied to a $200K fully-loaded cost.&lt;br&gt;
**Rework reduction assumes 30% of time currently lost to debugging/correcting AI output, with 50% of that recovered through higher-quality first-pass output.*&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Cost: Speed and SDLC Quality
&lt;/h2&gt;

&lt;p&gt;Token savings are the most measurable benefit, but the compounding effect on the overall software development lifecycle may be more significant. When AI agents have complete architectural context from the start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Onboarding time for new developers drops: they can query the CIM instead of reading source code for weeks&lt;/li&gt;
&lt;li&gt;Code review cycles shorten: reviewers can verify that generated code matches specified behavior, not just syntax&lt;/li&gt;
&lt;li&gt;Integration failures decrease: the CIM's explicit dependency map means fewer surprises when merging&lt;/li&gt;
&lt;li&gt;Documentation stays current: the CIM is regenerated from source, so it reflects the actual codebase, not the last time someone updated the wiki&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the customer evaluation referenced in Table 1, execution time was cut in half not just because of fewer tokens, but because the model needed fewer iteration cycles to produce correct output. The first attempt was closer to the right answer, which meant less back-and-forth, less rework, and a faster path from task to merged code.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture: Context Windows Grow. Codebases Grow Faster.
&lt;/h2&gt;

&lt;p&gt;Every LLM release announcement leads with a larger context window. The implicit promise is that this solves the context problem: just fit more code in the prompt. It doesn't.&lt;/p&gt;

&lt;p&gt;Context windows are growing at roughly 4x per generation. Enterprise codebases grow at roughly 10–20% per year, but more importantly, the codebases that need AI assistance most are the ones that have been growing for 20–30 years. A 2-million-token context window doesn't fit a 30-year-old insurance platform's stored procedures, metadata-driven configuration, and undocumented integration layers.&lt;/p&gt;

&lt;p&gt;As context windows grow but codebases grow faster, and as agentic loops multiply token consumption non-linearly, the gap between what an LLM can hold and what a production system contains will widen, not close. The teams that treat codebase understanding as a managed artifact, not an ad-hoc prompt input, will compound their AI investment advantages over time.&lt;/p&gt;

&lt;p&gt;CoreStory is the missing piece: the persistent, queryable Code Intelligence Model that gives AI agents what they actually need — &lt;strong&gt;not more tokens, but better ones&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Want to see CoreStory's token impact on your codebase? &lt;a href="https://corestory.ai/talk-to-an-expert" rel="noopener noreferrer"&gt;Talk to an engineer&lt;/a&gt; who can model your specific usage pattern.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does CoreStory work with my existing AI coding tools?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. CoreStory integrates with Claude Code, GitHub Copilot, Cursor, Devin, and other AI coding agents via MCP server integration and CoreStory Playbooks. The CIM is available as structured context that any AI agent can query.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Is the 70% token reduction typical?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The 73% input token reduction shown in Table 1 represents a specific task (adding a complex feature to a large codebase). Reductions vary by task type, codebase size, and the proportion of context the task requires. Tasks requiring narrow, well-specified context see the largest reductions; tasks requiring broad exploration may see less. The consistent finding across evaluations is that quality improves regardless of context reduction.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What programming languages does CoreStory support?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CoreStory supports a long list of languages including Java, C#, Python, COBOL, PowerBuilder, and SystemVerilog just to name a few.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>development</category>
      <category>agents</category>
    </item>
    <item>
      <title>How to Extract Business Rules from Legacy COBOL Code</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 14:27:32 +0000</pubDate>
      <link>https://dev.to/corestory/how-to-extract-business-rules-from-legacy-cobol-code-2fhn</link>
      <guid>https://dev.to/corestory/how-to-extract-business-rules-from-legacy-cobol-code-2fhn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; Extracting business rules from COBOL is where most modernization projects succeed or fail. The challenge isn’t reading the code but understanding the business logic embedded across thousands of programs, copybooks, and PERFORM chains. Static analysis tools (IBM ADDI, CAST Imaging) provide dependency mapping and visualization. LLM-assisted approaches add summarization but risk hallucination. CoreStory’s Code Intelligence Model combines structural COBOL analysis with AI-generated specifications and confidence scoring, producing validated business rules ready for modernization planning. In a production engagement, CoreStory extracted 1,984 business specifications with an 85.5% SME validation rate.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Counts as a Business Rule in COBOL
&lt;/h2&gt;

&lt;p&gt;Before choosing a tool, you need to define what you’re extracting. In modern languages, business rules are often isolated in service layers or rule engines. In COBOL, they’re woven through the code. A single business rule in a COBOL system might involve:&lt;/p&gt;

&lt;p&gt;A COMPUTE statement that calculates a premium based on risk factors defined in a copybook shared across 15 programs.&lt;/p&gt;

&lt;p&gt;An EVALUATE block that routes processing based on transaction type codes stored in a VSAM file.&lt;/p&gt;

&lt;p&gt;A chain of PERFORM statements that validate eligibility by checking conditions across three separate programs, each with its own copybook definitions.&lt;/p&gt;

&lt;p&gt;Implicit rules encoded in data definitions like a PIC 9(2) field that constrains a value to 0–99, enforcing a business constraint that exists nowhere in documentation.&lt;/p&gt;

&lt;p&gt;The difficulty is that these rules weren’t designed to be extracted. They evolved over decades of maintenance by dozens of developers, many of whom are no longer available to explain their intent. Dead code intermingles with active logic. Copybook definitions are shared across programs in ways that create invisible dependencies. GOTO statements create control flows that resist automated analysis.&lt;/p&gt;

&lt;p&gt;This is why generic AI tools fail at COBOL rule extraction. You can’t prompt your way through a system where a single business rule spans files, copybooks, and database calls, with critical context encoded in data type definitions.&lt;/p&gt;

&lt;p&gt;In this article we’ll look at three approaches available today, and look at their advantages and shortcomings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 1: Static Analysis Tools
&lt;/h2&gt;

&lt;p&gt;The established category for COBOL analysis is static analysis and dependency mapping. These tools parse COBOL source code and produce visualizations of program relationships, data flows, and control flows.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;IBM Application Discovery and Delivery Intelligence (ADDI)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;IBM ADDI is the most widely deployed tool for mainframe application analysis. It’s purpose-built for z/OS environments and provides:&lt;/p&gt;

&lt;p&gt;Cross-program dependency mapping: detects relationships between COBOL programs, JCL jobs, DB2 calls, IMS transactions, and datasets.&lt;/p&gt;

&lt;p&gt;Change impact analysis: traces forward and backward from any program or variable to identify what breaks if you modify it.&lt;/p&gt;

&lt;p&gt;Program call graphs: visual representations of control flow between programs.&lt;/p&gt;

&lt;p&gt;DB2 metadata analysis: pulls schemas from all DB2 tables associated with a job.&lt;/p&gt;

&lt;p&gt;ADDI excels at answering the dependency question of “what is connected to what”. It’s a critical first step in any modernization project because it prevents teams from making changes that break downstream processes they didn’t know existed.&lt;/p&gt;

&lt;p&gt;Where ADDI is limited is in answering the business rule question of “what does the code mean”. ADDI provides a graphical model of COBOL code that shows variable usage, data flows, and program relationships. But translating that structural information into documented business rules requires human interpretation. ADDI shows you the machinery but understanding what the machinery does is still a human task.&lt;/p&gt;

&lt;p&gt;IBM’s modernization stack pairs ADDI with watsonx Code Assistant for Z, which uses AI agents to generate natural language explanations of mainframe code. Together they form a pipeline: ADDI provides the structural analysis, watsonx provides AI-assisted explanation. This combined approach is powerful but remains tightly coupled to the IBM ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;CAST Imaging&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CAST Imaging provides architecture-level visualization of legacy applications, including COBOL. It creates interactive maps of software systems that show components, dependencies, data flows, and transaction paths.&lt;/p&gt;

&lt;p&gt;CAST’s strength is cross-technology analysis. It can map a system that includes COBOL programs, Java middleware, SQL databases, and web frontends, showing how they all connect. For organizations with heterogeneous technology stacks, this cross-technology view is valuable for modernization planning.&lt;/p&gt;

&lt;p&gt;Like ADDI, CAST Imaging is primarily a visualization and analysis tool. It shows you the structure of your system with impressive clarity but doesn’t generate business rule documentation automatically. The business rule extraction work still requires analysts to interpret the visualizations and write specifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Other static analysis tools&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Micro Focus Enterprise Analyzer (now part of OpenText) provides similar capabilities for COBOL, PL/I, and Natural applications. Fresche Solutions’ X-Analysis Advisor is specifically designed for IBM i environments, extracting rules from RPG and COBOL and writing them in pseudo code. IBM’s Rational Asset Analyzer provides centralized analysis and inventory management for mainframe application portfolios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 2: LLM-Assisted Extraction
&lt;/h2&gt;

&lt;p&gt;The emergence of large language models trained on programming languages has created a new approach: feeding COBOL paragraphs to an LLM and asking it to summarize the business logic in natural language.&lt;/p&gt;

&lt;p&gt;The pipeline typically works paragraph by paragraph: extract a COBOL section, send it to GPT-4, Claude, or a specialized model with a prompt like “explain the business logic in this COBOL code”, and collect the natural language summary. More sophisticated implementations use prompt chaining: first identify variables and data flows, then trace decision logic, then summarize the business rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What LLM-assisted extraction does well&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Fast iteration: you can process hundreds of COBOL paragraphs in hours rather than weeks.&lt;/p&gt;

&lt;p&gt;Natural language output: the summaries are immediately readable by business analysts who don’t know COBOL.&lt;/p&gt;

&lt;p&gt;Pattern recognition: LLMs are good at recognizing common COBOL patterns (date calculations, table lookups, record processing) and explaining them clearly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Where LLM-assisted extraction fails&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;LLMs hallucinate. This is not a theoretical risk — it’s the primary failure mode for LLM-based COBOL analysis. An LLM that “explains” a COBOL paragraph may confidently describe logic that doesn’t exist, miss critical edge cases encoded in copybook definitions it never saw, or invent variable relationships that are plausible but wrong.&lt;/p&gt;

&lt;p&gt;The problem compounds at scale. A single hallucinated business rule in a modernization specification can propagate through the entire project and as a result the new system implements a rule that the old system never enforced, or misses a rule that the old system depended on. In regulated industries (banking, insurance, healthcare), this isn’t just a bug; it’s a compliance violation.&lt;/p&gt;

&lt;p&gt;IBM Research’s A-COBREX tool (presented at ICSE 2025) demonstrates the state of the art in automated COBOL business rule identification. Evaluated on 27 programs with ground truth annotations, A-COBREX achieved 74.12% recall and 62.21% precision for fuzzy matching between extracted and actual rules. These numbers reflect the genuine difficulty of the problem: even purpose-built research tools miss roughly a quarter of the rules and include false positives in more than a third of their output.&lt;/p&gt;

&lt;p&gt;The LLM-assisted approach works best when paired with strong structural analysis (like ADDI) and mandatory human validation gates. Using an LLM alone to extract business rules from production COBOL is like using autocomplete to write a legal contract: the output looks right, but the stakes are too high for “looks right.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 3: AI Code Intelligence with Validated Specifications
&lt;/h2&gt;

&lt;p&gt;The third approach (and the one CoreStory implements) combines structural COBOL analysis with AI-powered specification generation and mandatory confidence scoring.&lt;/p&gt;

&lt;p&gt;The key distinction is that CoreStory doesn’t just summarize COBOL code. It analyzes it structurally: parsing abstract syntax trees, resolving copybook references, tracing PERFORM chains across programs, mapping data flows through VSAM and DB2 calls, and building a Code Intelligence Model that captures the complete architecture of the system. The AI-generated specifications are derived from this structural analysis, not from reading code as text.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How the CoreStory pipeline works for COBOL&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Ingestion: The entire COBOL estate is ingested — programs, copybooks, JCL, DB2 definitions, CICS maps. CoreStory supports the full mainframe ecosystem, not just the COBOL source.&lt;/p&gt;

&lt;p&gt;Structural analysis: AST parsing extracts program structures, data definitions, control flows, and cross-program relationships. Copybook references are resolved to their actual definitions.&lt;/p&gt;

&lt;p&gt;Intelligence model construction: A Code Intelligence Model captures the system’s architecture: which programs handle which functions, how data flows between components, where business logic is concentrated.&lt;/p&gt;

&lt;p&gt;Specification generation: AI-assisted analysis generates structured business specifications from the intelligence model. Each specification includes what the rule does, where it’s implemented, what data it depends on, and a confidence score.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A live production example&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In a production environment, CoreStory’s pipeline extracted 1,984 business specifications. Subject matter experts validated these specifications with an 85.5% approval rate. That’s not a benchmark on a test dataset. It is production output from a real system, validated by the people who maintain it.&lt;/p&gt;

&lt;p&gt;Confidence Scoring changes the conversation to “How do we direct our expensive human experts to the parts of code that actually needs their input?” instead of involving SMEs in reviewing every single spec.&lt;/p&gt;

&lt;p&gt;The 14.5% that SMEs flagged for revision is the system working as designed: confidence scoring identified the ambiguous cases that need review; human SMEs caught the edge cases that automated analysis couldn’t resolve; the specifications got corrected.&lt;/p&gt;

&lt;p&gt;The final output is a validated set of business rules that modernization teams can trust (acting as “safety net” that prevents production incidents, particularly important in regulated industries), without a project-crippling human overhead.&lt;br&gt;
‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right Approach
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Capability&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;IBM ADDI&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CAST Imaging&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;LLM-Assisted&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CoreStory CIM&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dependency mapping&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change impact analysis&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business rule documentation&lt;/td&gt;
&lt;td&gt;Manual (human interprets visualizations)&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Automated (with hallucination risk)&lt;/td&gt;
&lt;td&gt;Automated with confidence scoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copybook resolution&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Requires manual context&lt;/td&gt;
&lt;td&gt;Yes (full estate)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-program tracing&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited to context window&lt;/td&gt;
&lt;td&gt;Yes (entire system)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validation methodology&lt;/td&gt;
&lt;td&gt;Human review of visualizations&lt;/td&gt;
&lt;td&gt;Human review&lt;/td&gt;
&lt;td&gt;None built in&lt;/td&gt;
&lt;td&gt;SME validation with confidence scores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output format&lt;/td&gt;
&lt;td&gt;Dependency graphs, call maps&lt;/td&gt;
&lt;td&gt;Architecture visualizations&lt;/td&gt;
&lt;td&gt;Natural language summaries&lt;/td&gt;
&lt;td&gt;Structured specifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production validation data&lt;/td&gt;
&lt;td&gt;No public benchmarks&lt;/td&gt;
&lt;td&gt;No public benchmarks&lt;/td&gt;
&lt;td&gt;A-COBREX: 74% recall, 62% precision&lt;/td&gt;
&lt;td&gt;LifeSys: 1,984 specs, 85.5% validation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These approaches are not mutually exclusive. IBM’s own modernization stack demonstrates this: ADDI provides structural analysis, watsonx Code Assistant provides AI-assisted explanation. A practical enterprise approach might use ADDI for dependency mapping, an LLM for initial summaries, and CoreStory for validated specification generation. The question is where you need certainty.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Validation Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;The hardest part of COBOL business rule extraction isn’t the extraction, it’s knowing whether the extraction is correct.&lt;/p&gt;

&lt;p&gt;Static analysis tools produce accurate structural views but don’t generate business rule documentation. LLMs generate plausible summaries but can’t guarantee accuracy. The gap between “the tool says this is the business rule” and “we know this is the business rule” is where modernization projects fail.&lt;/p&gt;

&lt;p&gt;CoreStory’s confidence scoring addresses this directly. Every generated specification includes a confidence score that reflects the complexity of the underlying code, the ambiguity of the logic, and the completeness of the available context. High-confidence specs can be reviewed quickly. Low-confidence specs get deeper human analysis.&lt;/p&gt;

&lt;p&gt;This isn’t a cosmetic feature. In the production example above, confidence scoring correctly flagged the most problematic specifications, the ones that SMEs ultimately revised. The validation process becomes efficient because human expertise is directed where it’s most needed, not spread evenly across thousands of specifications.&lt;/p&gt;

&lt;h2&gt;
  
  
  From COBOL Code to Validated Business Rules
&lt;/h2&gt;

&lt;p&gt;Extracting business rules from COBOL isn’t optional — it’s the prerequisite for any modernization project that needs to preserve the logic that runs your business. The question is whether you do it manually (expensive, slow, error-prone), with an LLM (fast, cheap, unvalidated), or with a purpose-built code intelligence platform that combines structural analysis with validated specification generation.&lt;/p&gt;

&lt;p&gt;CoreStory’s Code Intelligence Model is the third option. Real results from a real mainframe system. 1,984 business specifications. 85.5% SME validation rate. Ready for modernization planning.&lt;/p&gt;

&lt;p&gt;See how CoreStory can help you extract valid business specifications from your COBOL codebase. &lt;a href="https://corestory.ai/talk-to-an-expert" rel="noopener noreferrer"&gt;Talk to an expert →&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Agent Boosting: The Missing Workflow for Getting Real Results from AI Coding Agents</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 13:42:26 +0000</pubDate>
      <link>https://dev.to/corestory/agent-boosting-the-missing-workflow-for-getting-real-results-from-ai-coding-agents-1p0n</link>
      <guid>https://dev.to/corestory/agent-boosting-the-missing-workflow-for-getting-real-results-from-ai-coding-agents-1p0n</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://corestory.ai" rel="noopener noreferrer"&gt;CoreStory&lt;/a&gt; by John Bender — &lt;br&gt;
&lt;a href="https://corestory.ai/post/agent-boosting-the-missing-workflow-for-getting-real-results-from-ai-coding-agents" rel="noopener noreferrer"&gt;read the original here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Your Agents Are Capable. They're Just Flying Blind.
&lt;/h2&gt;

&lt;p&gt;There's a growing gap between what AI coding agents can do in theory and what they actually deliver in practice. Claude Code, Cursor, Copilot, Devin, Codex, Droid — every major agent has gotten dramatically more capable over the past year. They can plan multi-step tasks, edit across files, run tests, and iterate on their own output.&lt;/p&gt;

&lt;p&gt;And yet, engineering teams keep reporting the same experience: the agent works on small tasks, stumbles on anything that crosses system boundaries, and burns tokens exploring dead ends it could have avoided with five minutes of architectural context.&lt;/p&gt;

&lt;p&gt;The problem isn't the agent. It's the context.&lt;/p&gt;

&lt;p&gt;Context engineering has emerged as one of the most important disciplines in AI-assisted development. Thoughtworks, Anthropic, and individual practitioners have all converged on the same insight: curating what the model sees is the single highest-leverage thing you can do to improve output quality. As Anthropic's own engineering team &lt;a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" rel="noopener noreferrer"&gt;put it&lt;/a&gt;, effective context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of the desired outcome.&lt;/p&gt;

&lt;p&gt;But there's a meaningful difference between configuring an agent (writing a CLAUDE.md file, setting up rules, defining skills) and actually giving it deep, structured knowledge about the system it's working in. Configuration tells the agent how to behave. Knowledge gives it something to reason about.&lt;/p&gt;

&lt;p&gt;Agent Boosting is the practice of closing that gap: equipping your coding agents with persistent, structured code intelligence so they perform at their actual capability ceiling rather than stumbling through unfamiliar code.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Sessions, Same Agent, Different Outcomes
&lt;/h2&gt;

&lt;p&gt;To understand what Agent Boosting changes, consider two versions of the same task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without Agent Boosting:&lt;/strong&gt; A developer asks their coding agent to fix a bug where inherited attributes are missing their docstrings in a Sphinx documentation build. The agent reads the relevant files, identifies the docstring retrieval logic, and patches it. The fix is locally coherent — it looks correct based on the code the agent can see. Tests fail. The agent iterates, adjusting the retrieval logic, adding edge case handling, exploring adjacent files. After 20 minutes and thousands of tokens, the developer intervenes and discovers the actual root cause: attributes were never collected during member enumeration, an upstream problem in a completely different function. The agent was fixing the right symptom in the wrong place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Boosting:&lt;/strong&gt; The same developer, same agent, same task. But before the agent starts exploring code, it queries CoreStory's intelligence model via MCP. CoreStory serves two roles in this interaction. First, it acts as an Oracle — answering questions about how the Sphinx documentation pipeline is intended to work, what the data flow looks like, and what invariants govern member enumeration. Then it acts as a Navigator — pointing the agent to the specific function where attributes are collected, the method signatures involved, and the extension points that downstream retrieval depends on.&lt;/p&gt;

&lt;p&gt;The agent sees immediately that the collection stage is the problem, not retrieval. It targets the upstream function, writes the fix, and passes tests on the first implementation.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical. It's &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;sphinx-8548 from CoreStory's SWE-bench evaluation&lt;/a&gt;, where three independent agents — Claude Code, Droid, and Codex — all converged on the same wrong fix at baseline, and all three solved the task correctly when given architectural context. When agents with different architectures and different underlying models all make the same mistake and all correct course from the same context, the failure isn't model-specific. It's a structural gap that better context closes.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Agents Fail on Complex Tasks
&lt;/h2&gt;

&lt;p&gt;Every AI coding agent, regardless of architecture or underlying model, shares the same fundamental constraint: it reasons from what's in its context window. When that context is raw source code, the agent has to infer architecture from implementation details, guess at dependencies it can't see, and reconstruct system boundaries that were never documented.&lt;/p&gt;

&lt;p&gt;This works fine for small, self-contained tasks. It breaks down predictably on anything that requires understanding how components relate to each other.&lt;/p&gt;

&lt;p&gt;In a &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;controlled evaluation&lt;/a&gt; CoreStory ran across six leading agents on the 45 hardest tasks in SWE-bench Verified, the failure pattern was consistent. Agents didn't fail because they couldn't write correct code. They failed because they pursued the wrong solution path — fixing symptoms instead of causes, missing hidden dependencies, or patching one location in a multi-file bug and leaving the others untouched.&lt;/p&gt;

&lt;p&gt;The dominant failure mode, accounting for 72% of all task flips from fail to pass, was wrong solution prevention: agents pursuing locally rational but architecturally incorrect approaches because they couldn't see pipeline boundaries. The second most common, at 46%, was hidden dependency discovery — implicit coupling between components that's invisible from local code inspection. In one Django task, two independent agents discovered through CoreStory that a transform class internally constructs a completely different lookup class, a dependency with no visible trace in the source file (the &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;full taxonomy of five failure modes&lt;/a&gt; is covered in our benchmark deep dive.)&lt;/p&gt;

&lt;p&gt;These aren't edge cases. Over half the tasks in the evaluation — 24 of 45 — contained at least one problem that an agent could only solve with better context.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  What Agent Boosting Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;Agent Boosting isn't a feature. It's a workflow discipline built on three principles.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Oracle before Navigator. Understanding before location.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The typical agent workflow is: receive task, explore code, form a plan, implement. Agent Boosting restructures this into two distinct phases before the agent writes any code.&lt;/p&gt;

&lt;p&gt;First, the agent queries CoreStory as an Oracle: How is this system intended to work? What are the invariants? What are the business rules? What's the data flow through this pipeline? This is context synthesized from the entire codebase — not just file contents, but the meaning behind them. The Oracle captures architecture, behavior contracts, design history, and edge cases that aren't visible in any single source file.&lt;/p&gt;

&lt;p&gt;Then the agent queries CoreStory as a Navigator: Which files do I need to change? What methods are involved? Where are the extension points? What are the call sites? Instead of grep-wandering through hundreds of files, the agent gets directed to exactly the code it needs.&lt;/p&gt;

&lt;p&gt;This Oracle-before-Navigator pattern is the single most important practice in Agent Boosting. It prevents the agent from diving into code changes before understanding the system's constraints. In CoreStory's &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;benchmark evaluation&lt;/a&gt;, this pattern improved success rates by an average of 25% across all six agents tested. The highest uplift was 44% (Claude Code), and even the strongest baseline agents (Droid and Devin, already at 80%+ success) improved by 14%. &lt;a href="https://www.businesswire.com/news/home/20251028246851/en/CoreStory-Raises-32-Million-Series-A-to-Modernize-the-Worlds-Legacy-Software-with-AI-for-Code-Intelligence" rel="noopener noreferrer"&gt;Research published jointly with Microsoft&lt;/a&gt; found a 51% accuracy improvement when AI agents operate from CoreStory's structured specifications rather than raw code.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Agent&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;With CoreStory&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Relative Uplift&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;56%&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+44%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;38%&lt;/td&gt;
&lt;td&gt;51%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+35%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot&lt;/td&gt;
&lt;td&gt;62%&lt;/td&gt;
&lt;td&gt;78%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+25%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;64%&lt;/td&gt;
&lt;td&gt;76%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+17%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Droid&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+14%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devin&lt;/td&gt;
&lt;td&gt;82%&lt;/td&gt;
&lt;td&gt;93%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+14%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Make context persistent and queryable, not session-scoped.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most context engineering today is session-scoped. You write a CLAUDE.md or a .cursorrules file, maybe set up some MCP servers, and the agent gets that context at the start of each session. This is a meaningful improvement over nothing, but it doesn't scale. Recent &lt;a href="https://www.infoq.com/news/2026/03/agents-context-file-value-review/" rel="noopener noreferrer"&gt;research from ETH Zurich&lt;/a&gt; found that LLM-generated context files actually degraded agent performance by 3% compared to no context file at all, while human-written files provided only a marginal 4% improvement. The researchers found that agents given more context often ran more steps and incurred higher costs without producing better patches, because the context wasn't structured for how agents actually consume information.&lt;/p&gt;

&lt;p&gt;Agent Boosting requires a persistent intelligence layer that goes deeper than markdown files. CoreStory's &lt;a href="https://corestory.ai/platform" rel="noopener noreferrer"&gt;Code Intelligence Model&lt;/a&gt; performs static analysis, call graph extraction, data flow tracing, and business logic summarization to produce structured output that captures what the software does, not just what it says. That intelligence persists across sessions, across developers, and across agents — and it's derived directly from the codebase, so it stays current as code evolves rather than drifting like manually written documentation. Conversations with the intelligence model persist too, accumulating institutional knowledge that &lt;a href="https://docs.corestory.ai/getting-started/supercharging-ai-agents" rel="noopener noreferrer"&gt;future queries in the same thread benefit from&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Eliminate cross-session re-ingestion.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Every time an agent starts a new session against the same codebase, it re-reads the same files, re-infers the same architecture, and re-discovers the same dependencies. That's wasted tokens and wasted time on every single session.&lt;/p&gt;

&lt;p&gt;Agent Boosting replaces this pattern with targeted Oracle and Navigator queries against persistent intelligence. Instead of the agent reading 300 files to orient itself, it asks: What are the dependencies of this module? What's the data flow through this pipeline? Where are all the call sites for this function? The answer comes back in hundreds of tokens instead of hundreds of thousands. CoreStory's &lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;cost evaluation&lt;/a&gt; measured this directly: Claude Code augmented with CoreStory used 73% fewer input tokens per task. Across the &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;benchmark evaluation&lt;/a&gt;, agents avoided reading an estimated 300-500 files in aggregate across all flipped tasks, replacing exploratory code archaeology with targeted architectural queries.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics: Why Agentic Loops Change the Math
&lt;/h2&gt;

&lt;p&gt;The cost case for Agent Boosting starts with an insight most teams haven't internalized yet: agentic loops don't scale linearly. A standard developer prompt re-ingests context once. An AI agent running a multi-step loop — plan, execute, reflect, error-correct, retry — re-ingests that context at every step. A 10-step agentic loop on raw code isn't 10x the token cost of a single prompt. It can be &lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;30-50x&lt;/a&gt;, because each reflection and error-correction cycle starts with a full context re-ingestion. And when the model lacks proper context, it produces longer, more hedged responses and requires more correction rounds — each of which generates output tokens that most providers charge 3-5x more for than input tokens.&lt;/p&gt;

&lt;p&gt;This is where Agent Boosting delivers its most dramatic ROI. Reducing context at the input doesn't just save on the first step. It compounds savings across every downstream step, every correction round, and every output generation in the loop.&lt;/p&gt;

&lt;p&gt;CoreStory's &lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;real-world cost evaluation&lt;/a&gt; measured the impact on a complex feature task against a large enterprise codebase:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Baseline (Claude Code)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;With CoreStory&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Reduction&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Processing time&lt;/td&gt;
&lt;td&gt;~92 min&lt;/td&gt;
&lt;td&gt;~47 min&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input tokens&lt;/td&gt;
&lt;td&gt;~1,320,000&lt;/td&gt;
&lt;td&gt;~357,500&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;73%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output tokens&lt;/td&gt;
&lt;td&gt;~87,000&lt;/td&gt;
&lt;td&gt;~43,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per task&lt;/td&gt;
&lt;td&gt;~$5.29&lt;/td&gt;
&lt;td&gt;~$1.74&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;67%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;At team scale, the numbers compound. A 10-engineer team running agents against a 500,000-token codebase can spend &lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;$15,000 to $40,000 per month&lt;/a&gt; on context re-ingestion alone. CoreStory's conservative modeling — applying a 50% token reduction to AI-assisted work hours and factoring in recovered developer time from higher first-pass accuracy — yields $740K to $890K in annual savings for a 10-engineer team. At the 50-engineer scale, the number approaches $3.7M to $4.5M annually.&lt;/p&gt;

&lt;p&gt;The developer time recovery isn't speculative. The &lt;a href="https://survey.stackoverflow.co/2025/ai" rel="noopener noreferrer"&gt;2025 Stack Overflow Developer Survey&lt;/a&gt; (65,000+ respondents) found that 45% of developers say debugging AI-generated code takes longer than debugging their own. &lt;a href="https://www.businesswire.com/news/home/20251028246851/en/CoreStory-Raises-32-Million-Series-A-to-Modernize-the-Worlds-Legacy-Software-with-AI-for-Code-Intelligence" rel="noopener noreferrer"&gt;Enterprises using CoreStory report&lt;/a&gt; up to a 50% reduction in human development time by replacing manual discovery, documentation, and validation with automated specifications. Better first-pass accuracy reduces debugging overhead directly.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Boosting Across the Development Lifecycle
&lt;/h2&gt;

&lt;p&gt;Agent Boosting isn't limited to bug fixes. The Oracle-before-Navigator pattern applies across the full development workflow, because every task benefits from the agent understanding the system before modifying it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug resolution.&lt;/strong&gt; The agent queries CoreStory to understand how the system should work, generates root cause hypotheses grounded in actual architecture, writes a failing test, and implements a minimal fix. This is the workflow behind the SWE-bench results above (&lt;a href="https://docs.corestory.ai/playbooks/agentic-bug-resolution" rel="noopener noreferrer"&gt;Playbook&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature implementation.&lt;/strong&gt; The agent uses CoreStory to understand existing patterns, data structures, and integration points before writing new code. Instead of inventing a new approach, it extends the system in a way that's consistent with established conventions (&lt;a href="https://docs.corestory.ai/playbooks/feature-implementation" rel="noopener noreferrer"&gt;Playbook&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec-driven development.&lt;/strong&gt; CoreStory provides the architectural truth that standalone specification tools can't — ensuring specs describe changes constrained by what the system actually does today, not what someone remembers it doing. The agent writes architecture-grounded specifications before implementation, then implements against them (&lt;a href="https://docs.corestory.ai/playbooks/spec-driven-development" rel="noopener noreferrer"&gt;Playbook&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test generation.&lt;/strong&gt; The agent derives comprehensive test suites from CoreStory specifications: positive cases, negative cases, edge cases, error contracts, and idempotency tests. Coverage is driven by business rules, not just code paths (&lt;a href="https://docs.corestory.ai/playbooks/spec-driven-test-generation" rel="noopener noreferrer"&gt;Playbook&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical due diligence.&lt;/strong&gt; In M&amp;amp;A scenarios, CoreStory enables rapid architectural analysis of acquisition targets: understanding architecture, identifying risks, assessing technical debt, and evaluating integration complexity — without needing the target's engineering team to walk you through it (&lt;a href="https://docs.corestory.ai/playbooks/ma-technical-due-diligence" rel="noopener noreferrer"&gt;Playbook&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Each of these workflows follows the same core pattern. The agent first consults CoreStory for understanding, then for location, then acts on what it learned. The specifics change. The discipline doesn't.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Agent Boosting Fits in the Context Engineering Stack
&lt;/h2&gt;

&lt;p&gt;Context engineering is becoming a layered discipline. As &lt;a href="https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html" rel="noopener noreferrer"&gt;Thoughtworks observed&lt;/a&gt;, all forms of AI coding context engineering ultimately involve markdown files with prompts, but those files serve fundamentally different purposes depending on what layer they operate at. Here's how Agent Boosting relates to the practices most teams already have in place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration files&lt;/strong&gt; (CLAUDE.md, .cursorrules, agent skills) tell the agent how to behave in your codebase: coding standards, testing conventions, preferred libraries. These are table stakes. But as &lt;a href="https://www.infoq.com/news/2026/03/agents-context-file-value-review/" rel="noopener noreferrer"&gt;ETH Zurich's research showed&lt;/a&gt;, even well-written config files provide only marginal accuracy gains while often increasing agent step count and cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP servers and tool access&lt;/strong&gt; give the agent the ability to query external systems, run commands, and interact with services. These expand what the agent can do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Boosting via persistent code intelligence&lt;/strong&gt; gives the agent structured knowledge about the system itself: architecture, data flow, dependencies, business rules, semantic intent. This determines whether the agent makes the right decisions with its expanded capabilities. CoreStory's &lt;a href="https://corestory.ai/platform" rel="noopener noreferrer"&gt;Code Intelligence Model&lt;/a&gt; is meaningfully different from a flat embedding index or RAG approach — it captures cross-module dependencies, behavior contracts, and business logic that chunked embeddings lose.&lt;/p&gt;

&lt;p&gt;The three layers are complementary. Configuration without knowledge produces agents that follow your style guide but still misunderstand your architecture. Knowledge without configuration produces agents that understand the system but don't follow your conventions. You need both.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Agent Boosting
&lt;/h2&gt;

&lt;p&gt;If you're already using AI coding agents, the fastest path to Agent Boosting is connecting your codebase to CoreStory's intelligence layer. CoreStory &lt;a href="https://docs.corestory.ai/getting-started/supercharging-ai-agents" rel="noopener noreferrer"&gt;integrates with Claude Code, Cursor, GitHub Copilot, Devin, Codex, and Droid&lt;/a&gt; via MCP — no changes to the agents themselves. Setup takes minutes: generate an MCP token in the CoreStory dashboard, add the server URL to your agent's configuration, and verify by asking the agent to list your projects.&lt;/p&gt;

&lt;p&gt;If you're evaluating agents, consider testing with and without structured architectural context. CoreStory's &lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;benchmark data&lt;/a&gt; shows that the agent you choose matters less than the context you give it. A mid-tier agent with good context routinely outperforms a top-tier agent flying blind. In the SWE-bench evaluation, Cursor augmented with CoreStory (51% success) outperformed baseline Codex (64% baseline, but without CoreStory's architectural guidance on the hardest failure modes).&lt;/p&gt;

&lt;p&gt;If you're managing costs, start by measuring your team's token re-ingestion rate: how many tokens per session are spent re-sending context the model already processed in a prior session? That number is your addressable waste. CoreStory customers have &lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;reduced it by 50-73%&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Whichever path you start from, adopt the Oracle-before-Navigator discipline immediately. Before your agent touches code, ask it to query for understanding first: How does this pipeline work? What are the invariants? What's the intended behavior? Then ask for location: Which files implement this? Where are the extension points?&lt;/p&gt;

&lt;p&gt;The quality of what the agent builds depends on the &lt;a href="https://docs.corestory.ai/getting-started/supercharging-ai-agents" rel="noopener noreferrer"&gt;specificity of what you ask&lt;/a&gt;. "Tell me about the order system" produces vague context. "What is the validation logic for order placement, what fields are required, and how is stock validation handled?" produces the kind of context that prevents wrong solutions.&lt;/p&gt;

&lt;p&gt;The agents are good enough. The question is whether you're giving them what they need to show it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to boost your coding agents?&lt;/strong&gt; &lt;a href="https://corestory.ai/waitlist-signup" rel="noopener noreferrer"&gt;Join the CoreStory waitlist&lt;/a&gt; or &lt;a href="https://corestory.ai/talk-to-an-expert" rel="noopener noreferrer"&gt;talk to an expert&lt;/a&gt; to model the impact on your codebase.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.corestory.ai/getting-started/supercharging-ai-agents" rel="noopener noreferrer"&gt;Supercharging AI Agents with CoreStory&lt;/a&gt; — setup guide, supported agents, workflow playbooks, and best practices&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://corestory.ai/post/deep-dive-how-corestory-improves-benchmark-performance-for-coding-agents" rel="noopener noreferrer"&gt;Deep Dive: How CoreStory Improves Benchmark Performance for Coding Agents&lt;/a&gt; — full benchmark methodology and failure mode taxonomy&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;How CoreStory Cuts LLM Costs by 70% While Improving Output Quality&lt;/a&gt; — token economics, team-scale cost modeling, and the agentic loop multiplier&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.businesswire.com/news/home/20251028246851/en/CoreStory-Raises-32-Million-Series-A-to-Modernize-the-Worlds-Legacy-Software-with-AI-for-Code-Intelligence" rel="noopener noreferrer"&gt;CoreStory Raises $32M Series A&lt;/a&gt; — Microsoft co-research, enterprise customer outcomes, investor thesis&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" rel="noopener noreferrer"&gt;Effective Context Engineering for AI Agents (Anthropic)&lt;/a&gt; — foundational context engineering principles&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html" rel="noopener noreferrer"&gt;Context Engineering for Coding Agents (ThoughtWorks)&lt;/a&gt; — practitioner's guide to the emerging context engineering discipline&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>development</category>
      <category>agents</category>
    </item>
    <item>
      <title>The Context Window Paradox: Why Throwing More Tokens at Legacy Code Doesn't Work</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Fri, 15 May 2026 13:17:31 +0000</pubDate>
      <link>https://dev.to/corestory/the-context-window-paradox-why-throwing-more-tokens-at-legacy-code-doesnt-work-439n</link>
      <guid>https://dev.to/corestory/the-context-window-paradox-why-throwing-more-tokens-at-legacy-code-doesnt-work-439n</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; Every engineering team working with LLMs on large codebases hits the same wall: the context window. The instinct is to think bigger windows will make things better. But research and practice show that bigger contexts actually degrade output quality through information overload, attention dilution, and the well-documented "lost in the middle" problem. The real solution isn't a bigger window — it's smarter context. By progressively decomposing a codebase along its natural architectural boundaries and recomposing structured intelligence, you give LLMs exactly the context they need to reason accurately about complex systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Working Memory Problem
&lt;/h2&gt;

&lt;p&gt;If you've tried to use an LLM for anything beyond generating a utility function (understanding a module's business logic, tracing a data flow across files, figuring out why a particular function exists…) you've felt the constraint.&lt;/p&gt;

&lt;p&gt;A context window is the working memory of a large language model. It's the lens through which the model sees everything: your prompt, the conversation history, any code or documents you've fed it with. The model doesn't have persistent memory. It has a sliding window of tokens, and everything it knows about your problem has to fit inside that window.&lt;/p&gt;

&lt;p&gt;Three things determine what happens inside that window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The focal point&lt;/strong&gt; — the model is always attending to specific tokens and surrounding text, deciding what matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The contextual relationships&lt;/strong&gt; — the model interprets connections between tokens to build an internal representation of meaning, not just pattern-matching strings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The window size&lt;/strong&gt; — the hard ceiling on how much data the model can hold in its working set at any given moment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a developer pasting in a few files to ask about business logic, these constraints become real fast. You hit token limits. Or worse, the model seems like it has room, but the output is wrong because critical context got pushed out of the window or diluted by everything else in there.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Context Windows Matter for Engineering Work
&lt;/h2&gt;

&lt;p&gt;The quality of an LLM's output on engineering tasks is directly tied to the context it can access. This plays out in three ways that matter for anyone working with real codebases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code understanding requires surrounding context.&lt;/strong&gt; When an LLM is parsing legacy code, it needs more than the function signature. It needs the imports, the calling code, the data structures being passed around, the copybooks being referenced. Without that surrounding context, the model is guessing. And on a mainframe modernization, guessing is how you introduce regressions that surface during month-end processing, the kind where your general ledger is suddenly off by six figures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern conformance depends on visible patterns.&lt;/strong&gt; LLMs adapt their outputs based on patterns observed in the context window. Feed the model well-structured context (naming conventions, architectural patterns, error handling standards, business rules) and it learns to conform. But only if that context fits in the window. Lose it, and the model generates code that looks right syntactically but violates every convention your team has established.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coherent generation requires architectural visibility.&lt;/strong&gt; When an LLM generates code that integrates with an existing codebase, coherence isn't optional. The output must match the style, error handling patterns, architectural decisions, and even commenting conventions of what's already there. That requires the model to see those patterns, which means &lt;strong&gt;context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The context window isn't just a technical spec on a model card. It's the bottleneck that determines whether AI-assisted engineering produces usable code or generates plausible-looking output that passes a review but fails in production.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Obvious (Wrong) Answer
&lt;/h2&gt;

&lt;p&gt;The first thing every engineer asks: why not just make the context window bigger?&lt;/p&gt;

&lt;p&gt;If the problem is fitting enough context, expand the window. A million tokens. Ten million. Problem solved. Not quite.&lt;/p&gt;

&lt;p&gt;Anyone who's worked with the larger context models has probably noticed that throwing everything in doesn't magically improve output. Sometimes it actually makes things demonstrably worse. More hallucinations, not fewer. Confident-sounding but incorrect answers. The model blending code from different modules as if they were the same thing.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fck9tt7jdg5txy5inmvff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fck9tt7jdg5txy5inmvff.png" alt="How the Context Window works independently of the maximum number of tokens available" width="800" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How the Context Window works independently of the maximum number of tokens available&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;There are specific, well-documented reasons why.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradox: Four Reasons Bigger Breaks Down
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Information Overload&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This one's intuitive and it happens to people too. Dump hundreds of thousands of tokens of COBOL into a model and ask it to find the business rule for calculating late fees. The model has to sift through JCL, copybooks, dead code, and commented-out sections from decades ago to find the relevant logic. More noise means more opportunities to latch onto the wrong thing.&lt;/p&gt;

&lt;p&gt;From a practical standpoint, larger contexts mean quadratically more compute in the attention mechanism, slower responses, and higher cost. On a large project processing thousands of programs, that cost compounds fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Lost in the Middle&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is well-documented in the research literature. LLMs exhibit what's called the "lost in the middle" problem, where they disproportionately attend to information at the beginning and end of the context window and pay significantly less attention to what's in the middle. It's an artifact of how attention mechanisms are trained.&lt;/p&gt;

&lt;p&gt;If your critical business logic lands in the middle third of a large context dump (and statistically, a third of it will) the model might effectively ignore it. The tokens are present. The information is there. But the attention weights are too diluted for the model to actually use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Poor Signal-to-Noise Ratio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When the window is packed full, the model struggles to differentiate what's important from what's noise. You get redundancy — the model restating the same concept in different ways. Contradictions — code that conflicts with patterns established elsewhere in the context. And bias amplification — if there's more boilerplate than business logic in the context, the model generates boilerplate-flavored answers even when you're asking about specific business rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Long-Range Dependency Decay&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is the killer for legacy modernization specifically. Going back to the large COBOL application example, a business rule might span multiple paragraphs, reference a copybook defined in a completely different member, depend on a working storage variable set three PERFORM THRU calls earlier, and behave differently based on a condition flag initialized in the JCL.&lt;/p&gt;

&lt;p&gt;These long-range dependencies (cause and effect separated by thousands of lines of code) are exactly what LLMs struggle with in large contexts. The attention mechanism degrades over distance. Concepts far apart in the token stream become weakly connected in the model's internal representation.&lt;/p&gt;

&lt;p&gt;The paradox is real: you need more context to understand complex systems, but more context degrades the model's ability to reason about what's in the window. You cannot brute-force your way to understanding a million-line codebase by dumping it all into a prompt.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting Numbers to the Problem
&lt;/h2&gt;

&lt;p&gt;Let's make this concrete with real numbers instead of abstractions.&lt;/p&gt;

&lt;p&gt;The current landscape of context window sizes tells the story. The largest commercially available context windows today top out around one million tokens. Most production models sit between 128K and 200K tokens. Open-source models commonly offer 8K to 16K.&lt;/p&gt;

&lt;p&gt;Now consider a real enterprise codebase. A million lines of code — and many mainframe shops that estimate half a million actually have two million once you count copybooks, JCL, utility programs, and batch processing logic. A conservative million lines at roughly 50 characters per line gives 50 million characters. At approximately 4 characters per token, that's around 12.5 million tokens.&lt;/p&gt;

&lt;p&gt;The largest context window on the market fits less than eight percent of a modest legacy codebase. Not even close.&lt;/p&gt;

&lt;p&gt;And remember, even if it all fit, the paradox means you wouldn't want to send it all. Quality degrades well before you hit the ceiling.&lt;/p&gt;

&lt;p&gt;Layer on the business reality. &lt;a href="https://www.sonarsource.com/blog/new-research-from-sonar-on-cost-of-technical-debt/" rel="noopener noreferrer"&gt;Research from Sonar across more than 200 projects&lt;/a&gt; found that technical debt costs approximately $306,000 per year per million lines of code. That's the maintenance burden: bugs from code nobody fully understands, fragility in systems nobody wants to touch, developer hours spent reverse-engineering undocumented logic.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Intelligent Decomposition, Not Bigger Windows
&lt;/h2&gt;

&lt;p&gt;What if, instead of trying to cram a whole codebase into a context window, you intelligently decomposed it first? Not randomly, not chunking by line count or by file, but following the natural taxonomy of the code itself. Respecting the boundaries the original developers built into the system.&lt;/p&gt;

&lt;p&gt;This is the approach CoreStory takes with its &lt;a href="https://corestory.ai/" rel="noopener noreferrer"&gt;code intelligence&lt;/a&gt; platform, and it works in two phases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase one: Progressive Decomposition.&lt;/strong&gt; The full codebase breaks down along its natural architectural boundaries. The full system decomposes into modules. Modules decompose into classes or programs. Programs decompose into functions, paragraphs, and procedures. This isn't arbitrary chunking — it follows the structure the original developers created, because that structure encodes how business logic is organized.&lt;/p&gt;

&lt;p&gt;The craft is in getting the boundaries right. You don't chunk in the middle of a function. The decomposition has to be semantically aware. It has to understand what constitutes a meaningful unit of code. Get the chunking wrong and you get garbage out. Get it right and you get specifications that reflect reality.&lt;/p&gt;

&lt;p&gt;At each level of decomposition, enterprise context gets applied — naming conventions, architectural patterns, coding standards, the things senior engineers know intuitively but that aren't written down anywhere. The output conforms to your world, not to generic training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase two: Progressive Recomposition.&lt;/strong&gt; Once each piece is analyzed with properly scoped context (context that fits in the window and gives the model everything it needs) the understanding recomposes back up the chain. Function-level analysis composes into class-level specs. Class specs compose into module-level documentation. Module specs compose into full-system requirements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4q2ikr10i7feqxan7jav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4q2ikr10i7feqxan7jav.png" alt="Diagram showing CoreStory's progressive decomposition and recomposition approach to solving the context window paradox" width="800" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The CoreStory approach at solving the Context Window Paradox through progressive Decomposition and Recomposition&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;What emerges is structured &lt;a href="https://corestory.ai/platform" rel="noopener noreferrer"&gt;code intelligence&lt;/a&gt;: not raw code, but persistent, queryable specifications that an LLM can reason about effectively. When you send context to a model, you're sending well-structured, properly scoped specs that fit within the window and give the model exactly what it needs.&lt;/p&gt;

&lt;p&gt;Real life application of this approach showed that &lt;a href="https://corestory.ai/post/how-corestory-cuts-llm-costs-by-70-while-improving-output-quality" rel="noopener noreferrer"&gt;Claude Code paired with CoreStory used 73% fewer input tokens&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Unlocks in Practice
&lt;/h2&gt;

&lt;p&gt;The technology only matters if it delivers real value. Here's what becomes possible when you solve the context problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actual business requirements from code, not restated syntax.&lt;/strong&gt; Not auto-generated comments that parrot the code in English, but real business requirements extracted from code behavior. Product requirement documents that describe what the system does in business terms. For many organizations, this alone justifies the effort: you finally get a source of truth for what the system actually does, rather than what someone wrote in a design document years ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature-to-code mapping for modernization and maintenance planning.&lt;/strong&gt; Once requirements are mapped to code modules, you can plan with data instead of intuition. Which modules carry the most business risk? Which have the most technical debt? Which are the best candidates for modernization first because they're self-contained? You have a traceable map from business capability to code implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent context for all future AI-assisted development.&lt;/strong&gt; The structured intelligence becomes seed data for every subsequent AI interaction. Every prompt, code generation task, and code review starts with accurate context about your enterprise's patterns, conventions, and architecture. You stop starting from zero every time you open a new chat session. This is what &lt;a href="https://corestory.ai/" rel="noopener noreferrer"&gt;context engineering&lt;/a&gt; looks like at enterprise scale: persistent understanding that compounds over time rather than evaporating with each session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compressed engineer ramp time.&lt;/strong&gt; Consider how long it takes a new developer to become productive on an existing system today. With structured, searchable specs tied directly to the running code, that ramp compresses dramatically. And existing engineers spend less time spelunking through code before they can change it.&lt;/p&gt;

&lt;p&gt;This comes up consistently in customer conversations: the problem isn't writing new code, it's understanding what the old code actually does before you can safely touch anything.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Ready to Solve the Context Problem?
&lt;/h2&gt;

&lt;p&gt;If you're sitting on a legacy system that needs modernization or a critical application that needs to be maintained, but you haven't found an approach that handles the scale and complexity of your codebase, the context window paradox is likely the root cause. CoreStory's code intelligence platform was built specifically to solve it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://corestory.ai/talk-to-an-expert" rel="noopener noreferrer"&gt;Talk to an expert&lt;/a&gt; about running a focused assessment on your codebase, or &lt;a href="https://app.corestory.ai/?cs_ref=blog&amp;amp;cta=blog" rel="noopener noreferrer"&gt;try CoreStory free&lt;/a&gt; to see the platform in action.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What exactly is the "lost in the middle" problem?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It's a well-documented behavior in LLMs where the model pays significantly more attention to information at the beginning and end of its context window than to information in the middle. Even when tokens are present in the window, the model may not effectively use them if they fall in the middle portion. This means critical business logic can be functionally invisible to the model even when it's technically within the context.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can't I just use RAG (retrieval-augmented generation) to solve this?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;RAG helps surface relevant chunks, but it doesn't solve the fundamental problem. RAG retrieves text fragments based on semantic similarity, which works well for documentation lookup but poorly for understanding code structure, cross-file dependencies, and business logic that spans multiple modules. You still need those retrieved chunks to fit meaningfully in the context window, and you still need the model to reason about their relationships correctly. Progressive decomposition and structured code intelligence give the model properly scoped, architecturally coherent context — not disconnected fragments.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How is CoreStory's approach different from just splitting code into smaller files?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Splitting code into arbitrary chunks (by file, by line count, by function) ignores the semantic structure of the codebase. CoreStory's progressive decomposition follows the natural architectural boundaries of the code, preserving the relationships and dependencies that make each unit of analysis meaningful. The recomposition phase then rebuilds understanding across those boundaries so nothing falls through the cracks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What programming languages does this work with?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;CoreStory supports a large variety of programming languages, from legacy systems like COBOL and Natural/ADABAS to modern stacks in Java, C#, Python, and more. The platform is designed for enterprise environments where multiple languages and frameworks coexist in the same system.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What size codebases can CoreStory handle?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The platform is built for enterprise scale. The progressive decomposition approach means codebase size isn't a limiting factor the way it is with raw context window approaches. Whether your system is hundreds of thousands or millions of lines, the analysis follows the same architectural decomposition methodology.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>development</category>
      <category>agents</category>
    </item>
    <item>
      <title>How to Give AI Coding Agents Better Codebase Context</title>
      <dc:creator>Michel Ozzello</dc:creator>
      <pubDate>Thu, 30 Apr 2026 20:03:55 +0000</pubDate>
      <link>https://dev.to/corestory/how-to-give-ai-coding-agents-better-codebase-context-2ac3</link>
      <guid>https://dev.to/corestory/how-to-give-ai-coding-agents-better-codebase-context-2ac3</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; AI coding agents fail on large codebases because they lack structured context about how the system actually works. The industry has converged on three tiers of solutions: static context files (AGENTS.md, .cursorrules), retrieval-augmented generation (Sourcegraph Cody, Continue.dev), and persistent code intelligence platforms (CoreStory). Each tier solves a different scale of the problem. This article explains what each approach does, where it breaks down, and how to evaluate which tier your team needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Coding Agents Fail on Large Codebases
&lt;/h2&gt;

&lt;p&gt;Every AI coding agent (Claude Code, Cursor, GitHub Copilot, Codex, Windsurf…) faces the same fundamental constraint: they can only act on what they can see. For a 500-line side project, that’s rarely a problem. The entire codebase fits in a single context window. The agent reads the code, understands the structure, and produces reasonable output.&lt;/p&gt;

&lt;p&gt;For a 500,000-line enterprise system spread across dozens of services, the math breaks. Even with million-token context windows now available in production models, you can’t fit an entire enterprise codebase into a prompt. And even if you could, raw source code doesn’t tell the agent why the system was built that way (the architectural decisions, the business rules embedded in legacy logic, the undocumented constraints that only exist in the heads of engineers who left three years ago…)&lt;/p&gt;

&lt;p&gt;The result is predictable: hallucinated imports, functions that don’t exist, patterns that contradict the codebase’s established conventions, and “fixes” that break other parts of the system the agent never saw.&lt;/p&gt;

&lt;p&gt;This isn’t a model intelligence problem. It’s an infrastructure problem. The agent is smart enough; it just doesn’t have the information it needs.&lt;/p&gt;

&lt;p&gt;There are 3 ways to approach this problem. Each takes a different approach and delivers different results.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 1: Static Context Files (AGENTS.md, .cursorrules, and CLAUDE.md)
&lt;/h2&gt;

&lt;p&gt;The first generation of codebase context delivery is the static context file. You write a markdown file, drop it in your repository root, and the agent reads it before doing any work.&lt;/p&gt;

&lt;p&gt;The format landscape has consolidated rapidly. In 2025, every tool had its own approach: Claude Code read CLAUDE.md, Cursor read .cursorrules, GitHub Copilot read .github/copilot-instructions.md. By early 2026, the industry converged on AGENTS.md — now an open standard backed by the Linux Foundation, supported by every major AI coding agent, and adopted by tens of thousands of repositories. OpenAI’s Codex reads AGENTS.md files at every level of the directory tree. Apache Airflow and Temporal have adopted the format. At time of writing, the OpenAI repository alone contains 88 AGENTS.md files.&lt;/p&gt;

&lt;h3&gt;
  
  
  What AGENTS.md does well
&lt;/h3&gt;

&lt;p&gt;Gives agents project-specific instructions: build commands, coding conventions, test runners, and constraints the agent can’t infer from the code alone.&lt;/p&gt;

&lt;p&gt;Portable across tools. One file, one format, understood by Claude Code, Codex, Cursor, Copilot, Windsurf, and more.&lt;/p&gt;

&lt;p&gt;Low cost to create. A useful AGENTS.md takes 30 minutes to write and immediately improves agent output quality on small-to-medium repositories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where it breaks down
&lt;/h3&gt;

&lt;p&gt;A recent ETH Zurich study (AGENTbench, 2026 - &lt;a href="https://arxiv.org/html/2602.11988v1" rel="noopener noreferrer"&gt;source&lt;/a&gt;) tested context files rigorously across 138 real-world Python tasks. The findings were nuanced: LLM-generated AGENTS.md files actually reduced task success rates by approximately 3% and increased inference costs by over 20%. Human-written files performed better, but only when limited to non-inferable details — custom tooling, counterintuitive patterns, and project-specific constraints.&lt;/p&gt;

&lt;p&gt;The core limitation is structural. AGENTS.md is a flat file. It doesn’t understand your code; it’s a set of instructions that you manually maintain. For a 100-file project, that works. For a 10,000-file enterprise system, you face three problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Staleness:&lt;/strong&gt; The file drifts as the codebase evolves, and there’s no automated way to detect when it becomes inaccurate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scale:&lt;/strong&gt; You can’t describe an entire enterprise architecture in a markdown file without blowing the agent’s context window budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Depth:&lt;/strong&gt; AGENTS.md tells the agent what commands to run and what patterns to follow. It doesn’t tell the agent how the system actually works — call graphs, data flows, component relationships, business rules.&lt;/p&gt;

&lt;p&gt;As one practitioner noted: the real value of writing an AGENTS.md is that it forces you to articulate things about your codebase that were previously just in your head. That’s valuable, but it’s documentation, not intelligence.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 2: RAG-Based Context Retrieval (Sourcegraph Cody, Continue.dev, Windsurf)
&lt;/h2&gt;

&lt;p&gt;The second tier moves from static files to dynamic retrieval. Rather than telling the agent everything upfront, RAG systems index your codebase and retrieve relevant code snippets at query time.&lt;/p&gt;

&lt;h3&gt;
  
  
  How RAG works for code
&lt;/h3&gt;

&lt;p&gt;The pipeline is conceptually straightforward: split your codebase into chunks, embed those chunks into a vector space, store them in a vector database, and at query time, find the chunks most semantically similar to the agent’s current task. The retrieved chunks get inserted into the agent’s context window alongside the prompt.&lt;/p&gt;

&lt;p&gt;Sourcegraph Cody is the most mature implementation. It combines Sourcegraph’s code search engine (keyword search, SCIP-based code graph, and semantic search) with RAG to provide multi-repository context retrieval. Cody supports context windows up to 1 million tokens and can pull context from up to 10 remote repositories. The architecture gives it strong advantages for teams already using Sourcegraph for code search.&lt;/p&gt;

&lt;p&gt;Other notable implementations include Windsurf’s Flow context engine, which uses hybrid semantic + BM25 search with a proprietary M-Query retrieval method; Continue.dev, which provides an open-source framework for building custom code RAG pipelines with MCP integration; and Qodo’s Context Engine, which combines RAG with agentic reasoning for multi-repository intelligence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why RAG is better than static files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Dynamic:&lt;/strong&gt; The agent retrieves what’s relevant to the current task, not a fixed set of instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalable:&lt;/strong&gt; Can index hundreds of thousands of files across multiple repositories.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current:&lt;/strong&gt; Re-indexing keeps the retrieval layer in sync with code changes (though update frequency varies - some systems re-index daily, others weekly).&lt;/p&gt;

&lt;h3&gt;
  
  
  Where RAG falls short
&lt;/h3&gt;

&lt;p&gt;RAG retrieves code. It doesn’t understand code. The distinction matters.&lt;/p&gt;

&lt;p&gt;When you ask a RAG system "how does authentication work in this system?", it finds files that are semantically similar to your query, files with words like "auth", "login", "token" in them. That’s useful, but it doesn’t give you the architectural picture of which services are involved, what the call chain looks like, where the business rules live, how the authentication flow interacts with the session management system, or why the team chose this approach over alternatives.&lt;/p&gt;

&lt;p&gt;Several teams working on code intelligence at scale have found that AST-based retrieval (following import graphs, type hierarchies, and call chains) outperforms vector similarity for structural code queries. RAG is reactive and unstructured. It responds to what you ask, returning text fragments ranked by similarity but it doesn’t proactively tell the agent things it needs to know but hasn’t thought to ask about.&lt;/p&gt;

&lt;p&gt;For many teams, RAG is the right solution. If your codebase is under 500,000 lines and your agents are primarily doing file-level edits and feature additions, RAG-based tools like Cody, Windsurf, or Continue.dev provide a significant improvement over static context files.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 3: Persistent Code Intelligence. From Retrieval to Understanding
&lt;/h2&gt;

&lt;p&gt;The third tier addresses what RAG cannot: structural understanding of how a codebase actually works.&lt;/p&gt;

&lt;p&gt;A Code Intelligence Model (CIM) doesn’t just index your code. It analyzes it, parsing abstract syntax trees, extracting call graphs, mapping component relationships, identifying business rules, and building a persistent, queryable model of the entire system. The output isn’t retrieved text fragments, it’s structured specifications: "this service handles payment processing, it depends on these three other services, it implements these business rules, and it was last modified on this date."&lt;/p&gt;

&lt;p&gt;The key difference is persistence. Where RAG retrieves on demand and forgets between sessions, a Code Intelligence Model builds understanding that survives turnover, compounds over time, and is accessible to any tool that needs it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqweaga6up4s4fo1dnd5q.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqweaga6up4s4fo1dnd5q.jpg" alt="RAG compared to Code Intelligence Models" width="800" height="413"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;CoreStory MCP server delivering structured code intelligence to an AI agent, showing component specifications and architecture maps&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes a Code Intelligence Model (CIM) different from RAG
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;AGENTS.md&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;Code Intelligence Model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual markdown&lt;/td&gt;
&lt;td&gt;Embedded code chunks&lt;/td&gt;
&lt;td&gt;Analyzed specifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Update mechanism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Human edits file&lt;/td&gt;
&lt;td&gt;Re-index periodically&lt;/td&gt;
&lt;td&gt;Git-diff driven, incremental&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instructions &amp;amp; rules&lt;/td&gt;
&lt;td&gt;Raw code snippets&lt;/td&gt;
&lt;td&gt;Structured specs &amp;amp; relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Understands architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Partially (via search)&lt;/td&gt;
&lt;td&gt;Yes (call graphs, component maps)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Persists across sessions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (static file)&lt;/td&gt;
&lt;td&gt;Index persists; context doesn’t&lt;/td&gt;
&lt;td&gt;Yes (queryable model)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Business rule extraction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scales to 10M+ lines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;With infrastructure&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent delivery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;In-context file&lt;/td&gt;
&lt;td&gt;IDE plugin / API&lt;/td&gt;
&lt;td&gt;MCP server / API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  How a Code Intelligence Model Delivers Agent Context
&lt;/h2&gt;

&lt;p&gt;CoreStory is a persistent code intelligence platform purpose-built for enterprise codebases. It ingests your entire repository, regardless of language, framework, or size, and builds a Code Intelligence Model: a knowledge graph of your system that captures architecture, component relationships, business rules, and data flows.&lt;/p&gt;

&lt;p&gt;The CIM is delivered to AI coding agents via MCP (Model Context Protocol), the open standard for connecting AI tools to external context sources. When an agent running in Claude Code, Cursor, or any MCP-compatible environment needs to understand part of your system, it queries CoreStory’s MCP server and receives structured specifications. Not raw code, but an analyzed understanding of what the code does and why.&lt;/p&gt;

&lt;h3&gt;
  
  
  What agents receive from CoreStory
&lt;/h3&gt;

&lt;p&gt;Component specifications: what each module does, its responsibilities, dependencies, and public interfaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture maps:&lt;/strong&gt; how services connect, what the call chains look like, where data flows between components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business rule documentation:&lt;/strong&gt; the logic embedded in code, extracted and structured for human and machine consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change context:&lt;/strong&gt; what was recently modified, by whom, and what specifications were affected.&lt;/p&gt;

&lt;p&gt;This is the difference between handing a contractor a stack of code printouts and giving them a technical architecture document written by a senior engineer who knows the system inside out.&lt;/p&gt;

&lt;p&gt;For one particular production mainframe system, CoreStory extracted 1,984 business specifications from a live COBOL codebase with an 85.5% SME validation rate. That’s not documentation generated from prompts but structured intelligence derived directly from source code analysis and validated by the people who know the system.&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Evaluate Your Current Approach
&lt;/h2&gt;

&lt;p&gt;Different teams may need different approaches. The right tier depends on your codebase complexity, team size, and what you’re asking agents to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with AGENTS.md if:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Your codebase is under 100,000 lines and well-structured.&lt;/li&gt;
&lt;li&gt;Your agents primarily handle file-level tasks: writing functions, fixing bugs, generating tests.&lt;/li&gt;
&lt;li&gt;You have a small team that can manually maintain the context file as the codebase evolves.&lt;/li&gt;
&lt;li&gt;You’re using multiple AI coding tools and need a single, portable context format.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Move to RAG-based tools if:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Your codebase spans multiple repositories or exceeds what fits in a context window.&lt;/li&gt;
&lt;li&gt;Your agents need to reference code outside the currently open files.&lt;/li&gt;
&lt;li&gt;You’re already using Sourcegraph or a similar code search platform.&lt;/li&gt;
&lt;li&gt;You need dynamic retrieval — different context for different tasks — rather than a fixed instruction set.
‍
### Invest in a Code Intelligence Model if:&lt;/li&gt;
&lt;li&gt;Your codebase exceeds 500,000 lines, spans multiple languages, or includes legacy systems.&lt;/li&gt;
&lt;li&gt;Your agents need to understand architecture and business logic, not just find relevant files.&lt;/li&gt;
&lt;li&gt;You’re planning a modernization, migration, or major refactoring initiative.&lt;/li&gt;
&lt;li&gt;Knowledge loss from developer turnover is a real business risk.&lt;/li&gt;
&lt;li&gt;You need intelligence that persists across sessions, tools, and team changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tiers above are not mutually exclusive. Many enterprise teams use AGENTS.md for project-specific instructions alongside a CIM for structural intelligence. The AGENTS.md handles "run this test command" and "use this naming convention." The CIM handles "here’s how the payment processing pipeline actually works."&lt;/p&gt;

&lt;p&gt;‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop Giving Your Agents Workarounds
&lt;/h2&gt;

&lt;p&gt;AGENTS.md was a necessary first step. RAG-based retrieval was a meaningful upgrade. But if your AI coding agents are still guessing about how your system works the problem isn’t the model. It’s the infrastructure.&lt;/p&gt;

&lt;p&gt;CoreStory is the persistent code intelligence layer that gives agents what they actually need: a structured, always-current understanding of your entire codebase. Not another prompt engineering trick. Not another configuration file. A production-grade intelligence layer that sits between your code and any agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://corestory.ai" rel="noopener noreferrer"&gt;See how CoreStory delivers codebase context to your AI agents&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sdlc</category>
      <category>coding</category>
    </item>
  </channel>
</rss>
