DEV Community

Cover image for How I Use AST Diffing and LLMs to Keep Docs in Sync with Code
ElshadHu
ElshadHu

Posted on

How I Use AST Diffing and LLMs to Keep Docs in Sync with Code

Oh Here We Go Again: Documentation

Since I started coding, creating projects, and contributing to projects, what I've realized so far is that outdated docs waste more time than missing docs. For that reason, I decided to build a CLI tool called mark-guard which detects Go code changes and updates your markdown docs using AST diffing and LLMs. I'm planning to add support for other languages.

Which One Is More Painful: Creating from Scratch or Modifying the Docs

From my point of view, modifying is harder than creating from scratch, and that's what my tool tries to solve. Its purpose isn't creating docs from scratch but rather modifying them while we modify the project. Currently, I'm using my tool along with my friends. I think it'll really shine once people try it on bigger codebases with lots of legacy docs.


Core Architecture

When I started building this project, my first thought was a full AST tree approach. I had worked on something similar with my friend in wtf-script, but for this tool that was over-engineering it. I did not need to understand every node in the tree. I needed to know what the public API looks like before and after a code change.

After researching, I went with Go's built-in go/parser package. Parse the old version of a file (from git) and the new version (from disk), extract all exported symbols from both, compare them, and feed the structured diff to an LLM that updates the docs.

How Symbol Extraction Works

The entry point is ExtractSymbols


// ExtractSymbols parses Go source code and returns all exported symbols.
func ExtractSymbols(filename, src string) ([]Symbol, error) {
    fset := token.NewFileSet()
    f, err := parser.ParseFile(fset, filename, src,
        parser.SkipObjectResolution|parser.ParseComments)
    if err != nil {
        return nil, fmt.Errorf("parse %s: %w", filename, err)
    }

    var symbols []Symbol
    for _, decl := range f.Decls {
        switch d := decl.(type) {
        case *ast.FuncDecl:
            if sym, ok := extractFunc(fset, d); ok {
                symbols = append(symbols, sym)
            }
        case *ast.GenDecl:
            symbols = append(symbols, extractGenDecl(fset, d)...)
        }
    }
    return symbols, nil
}
Enter fullscreen mode Exit fullscreen mode

token.NewFileSet(): Tracks line numbers and byte offsets for every token. Needed when I print AST nodes back as Go source.

parser.ParseFile(fset, filename, src, flags): Takes raw Go source, returns *ast.File with all top-level declarations (functions, types, constants, variables, imports).

parser.SkipObjectResolution: I don't need identifier resolution. I'm only reading declarations, not analyzing usage. Skipping it makes parsing faster.

parser.ParseComments: I want doc comments so I can include them in the diff context.

Walking Declarations

f.Decls contains every top-level declaration. I handle two types:

  • *ast.FuncDecl: Functions and methods. I skip unexported ones. For methods, I grab the receiver type and name them Receiver.Method.

  • *ast.GenDecl: Everything else: structs, interfaces, type aliases, type definitions, consts, vars. We skip imports. Each gets classified by kind and we extract fields, methods, or values accordingly.

Where Comparison Happens

The diffing lives in diff.go. Diff() takes two symbol slices (old and new) and runs three passes:

Pass 1: Added    (in new, not in old)
Pass 2: Removed  (in old, not in new)
Pass 3: Modified (in both, but changed)
Enter fullscreen mode Exit fullscreen mode

I index both slices by name into maps so lookups are fast. For modified symbols, compareSymbols() checks:

  • Kind change (e.g. struct to interface)
  • Parameters added, removed, or type changed
  • Return values changed
  • Struct fields added or removed
  • Interface methods changed
// Diff compares old and cur symbol slices and returns a list of changes
func Diff(old, cur []Symbol) []SymbolDiff {
    oldMap := indexByName(old)
    curMap := indexByName(cur)
    var diffs []SymbolDiff

    // Pass 1: added (in cur not in old)
    for name, sym := range curMap {
        if _, exists := oldMap[name]; !exists {
            diffs = append(diffs, SymbolDiff{
                Name:   name,
                Kind:   ChangeAdded,
                Symbol: sym,
            })
        }
    }


    // Pass 2 and 3 follow the same pattern for removed and modified...

    // Then sorting happens ...
    return diffs
}
Enter fullscreen mode Exit fullscreen mode

The result is a sorted []SymbolDiff (added first, then removed, then modified, alphabetical within each group) formatted into either a human-readable summary (FormatDiffSummary) or a compact version for the LLM (FormatDiffSummaryCompact):

// FormatDiffSummaryCompact is a terse version of FormatDiffSummary.
// It omits doc comments and per-field descriptions to reduce LLM input tokens.
func FormatDiffSummaryCompact(diffs []SymbolDiff) string {
    if len(diffs) == 0 {
        return "No changes to exported symbols"
    }
    var sb strings.Builder
    for i := range diffs {
        switch diffs[i].Kind {
        case ChangeAdded:
            fmt.Fprintf(&sb, "+ ADDED   %s: %s\n", diffs[i].Name, diffs[i].Symbol.Signature)
        case ChangeRemoved:
            fmt.Fprintf(&sb, "- REMOVED %s: %s\n", diffs[i].Name, diffs[i].OldSignature)
        case ChangeModified:
            fmt.Fprintf(&sb, "~ CHANGED %s\n  was: %s\n  now: %s\n",
                diffs[i].Name, diffs[i].OldSignature, diffs[i].Symbol.Signature)
        }
    }
    return strings.TrimSpace(sb.String())
}
Enter fullscreen mode Exit fullscreen mode

Making the Prompt Work: Restrictions Over Flexibility

Let me tell you one thing about prompts: you can't know what's gonna happen until you test it. When I first wrote the prompt for this project, the LLM rewrote entire documents instead of making small edits. It didn't work because of lack of detail and lack of experiments.

Then I found this documentation which explains how XML structure makes the prompt more explicit and reduces the misinterpretation. So, I created sanitize.go for wrapping texts with XML structure:

// WrapRole wraps the role in <ROLE>
func WrapRole(role string) string {
    return "<ROLE>\n" + role + "\n</ROLE>"
}

// WrapContext wraps the context in <CONTEXT>
func WrapContext(ctx string) string {
    return "<CONTEXT>\n" + ctx + "\n</CONTEXT>"
}

// same style wrapping functions exist...
Enter fullscreen mode Exit fullscreen mode

then I wrote buildSystemPrompt which combines all these functions:

func buildSystemPrompt() string {
    return strings.Join([]string{
        WrapRole(roleText),
        WrapContext(contextText),
        WrapScale(scaleText),
        WrapRules(rulesText),
        WrapTone(toneText),
        WrapEdgeCases(edgeCasesText),
        WrapExamples(examplesText),
    }, "\n\n")
}
Enter fullscreen mode Exit fullscreen mode

Each section has a specific job:

  • Role: Tell the LLM what it is (a documentation editor, not a creative writer)
  • Context: What's happening (code changed, docs need updating)
  • Scale: How big the project is (so it doesn't over-generate)
  • Rules: What it can and can't do (no deleting sections, no adding features that don't exist)
  • Tone: Match the existing doc style
  • Edge Cases: What to do when nothing needs changing (return empty edits)
  • Examples: Show it what good output looks like

Maybe it's not the best prompt out there, but out of all my experiments this is what worked. What I realized is that giving the LLM restrictions rather than flexibility made the output more predictable.

Bottlenecks that I didn't think about when Starting:

The first version of the pipeline destroyed my README. It asked the LLM to return the whole file, the response got truncated, and WriteUpdate overwrote the original with a partial copy. I moved to edit-based JSON output where the LLM returns small replace/insert_after/delete operations instead of rewriting files.

// The LLM only emits the exact bytes that need to change, never the full file
type Edit struct {
    File    string `json:"file"`              // relative doc path
    Section string `json:"section,omitempty"` // nearest heading, context only, not used for matching
    Action  string `json:"action"`            // "replace" | "insert_after" | "delete"
    OldText string `json:"old_text"`          // text that must exist in the file
    NewText string `json:"new_text"`          // text to write in its place
}
Enter fullscreen mode Exit fullscreen mode

The second problem was invisible. Parse errors during symbol extraction were silently swallowed, which made every symbol look "added" instead of "modified." The diff was wrong and the LLM was confused. I added warning logs so failures are visible.

The third was trust. The pipeline wrote to disk immediately with no preview. I added --write (default dry-run), content-loss validation that blocks updates losing >50% of a file, and --force for when you know better.


Git Integration: No Extra Dependencies

Anyone using this tool already has git installed. That is why I shell out to git directly with os/exec instead of pulling in go-git (which brings transitive dependencies). All git commands are defined as constants:

const (
    gitCmdDiff     = "diff"
    gitCmdLsFiles  = "ls-files"
    gitCmdShow     = "show"
    gitCmdRevParse = "rev-parse"
    // other commands...
)
Enter fullscreen mode Exit fullscreen mode

Configuration

Everything is driven by .markguard.yaml at the repo root. Two sections: llm for the LLM provider and docs for which files to scan.

llm:
  base_url: "https://generativelanguage.googleapis.com/v1beta/openai"
  api_key_env: "GEMINI_API_KEY"
  model: "gemini-2.5-flash"
docs:
  paths:
    - "README.md"
  exclude:
    - "docs/roadmap.md"
    - "docs/day*.md"
  mappings:
    - docs: ["docs/api.md"]
      code: ["internal/git/", "internal/config/"]

Enter fullscreen mode Exit fullscreen mode

The Go structs map directly to the YAML:

// Config is the top level configuration
type Config struct {
    LLM  LLMConfig  `yaml:"llm"`
    Docs DocsConfig `yaml:"docs"`
}

// LLMConfig holds settings for LLM provider
type LLMConfig struct {
    BaseURL   string `yaml:"base_url"`
    APIKeyEnv string `yaml:"api_key_env"`
    Model     string `yaml:"model"`
}

// DocsConfig holds settings for documentation scanning
type DocsConfig struct {
    Paths    []string           `yaml:"paths"`
    Exclude  []string           `yaml:"exclude"`
    Mappings []model.DocMapping `yaml:"mappings"`
}
Enter fullscreen mode Exit fullscreen mode

If the file doesn't exist, Load() just returns defaults: Gemini as the provider, README.md and docs/ as the paths. You don't need to create a config file to get started. I used yaml.v3 directly instead of Viper because Viper drags in a ton of transitive dependencies for something I can do in 5 lines with yaml.Unmarshal.

The mappings field is what controls token usage. Without it, every doc gets sent to the LLM on every run. With mappings, only docs mapped to the changed code paths get sent. I tested it on the cobra project by changing the overall structure to see how it would handle the docs, and it worked pretty well, only using around 15K tokens even for architectural changes.


What are Next Steps

Right now mark-guard only works with Go, but the architecture is language-agnostic. The parser is isolated in one package. If you wanted to add Python, Rust, or TypeScript support, you'd write a new extractor and plug it in. The rest of the pipeline stays the same. Where I need help right now:

  • Prompt tuning: The --debug flag prints the full prompt and raw LLM response. I've been running it on projects to see where the output breaks. If you run it on your project and the edits are wrong, that debug output helps me figure out why.
  • More test coverage: There are open issues for unit tests on edit parsing, doc scanning edge cases, and pipeline integration tests.

If you have ideas, questions, or want to work on something, open an issue. You don't need a full plan, just an idea is enough and we can work on it.

Top comments (2)

Collapse
 
1magamahmudov profile image
1magamahmudov

This is a fascinating approach. Relying purely on LLMs for docs can sometimes lead to hallucinations, but grounding the prompt with precise AST diffs is a really smart way to ensure accuracy. Awesome work Elshad!

Collapse
 
elshadhu profile image
ElshadHu

Thanks for the feedback 🙏 @1magamahmudov