DEV Community

Laurent Charignon
Laurent Charignon

Posted on • Originally published at blog.laurentcharignon.com

Building with LLMs at Scale: Part 4 - Experiments and Works-in-Progress

In Part 1, Part 2, and Part 3, I covered pain points and solutions that work reliably. This article is different—it's about experiments, works-in-progress, and lessons from things that didn't quite pan out.

Not every tool needs to be polished. Some are scaffolding for better ideas. Some solve problems that disappear with faster models. And some teach valuable lessons even when they fail.

The Project Explorer: Solving Yesterday's Problem

The Original Problem

Before Sonnet 4.5, exploring a codebase with Claude was slow. Reading 20 files meant 20 sequential API calls, token limits to manage, and 10+ minutes of setup time.

Workarounds emerged: naming key files with an @ prefix (@README.md, @main.go) so they'd appear first in directory listings, making them easier for Claude to discover. Some users created special "guide" files that aggregated important context.

I built project-ingest (inspired by gitingest.com) to solve this. The tool would output a single markdown document with the project structure, key file contents, and dependency graph. Claude could ingest this in one shot instead of reading files incrementally.

What Changed

Sonnet 4.5 changed the game, though I'm not entirely sure how. Is it just faster at reading files? Does it batch requests differently? Does it handle context more efficiently? Whatever the implementation, the result is clear: it's fast enough that project ingestion overhead feels worse than just reading files directly.

Before (Sonnet 3.5):

  • Run project-ingest → 15 seconds
  • Claude reads summary → 5 seconds
  • Total: 20 seconds

After (Sonnet 4.5):

  • Claude reads 20 files directly → 8 seconds
  • Total: 8 seconds

The ingester became slower than the problem it solved.

When It's Still Useful

I haven't deleted project-ingest because it remains valuable for:

  1. Very large codebases (100+ files): Still faster to get a high-level view
  2. Project snapshots: Capturing codebase state at a point in time
  3. Documentation generation: Creating an overview for human readers
  4. Cross-project analysis: Comparing architecture across multiple projects

But for everyday "help me understand this project" tasks? Obsolete.

The Lesson

Build for today's constraints, not tomorrow's. The tool was perfect for its time, but model improvements made it obsolete. That's okay. The investment taught me patterns I applied elsewhere (like how to efficiently traverse project structures).

When a tool becomes unnecessary because the problem disappeared, that's a success, not a failure.

Code Review in Emacs: Closing the Loop

The Review Problem

I'm browsing through a codebase—maybe one I wrote months ago, maybe one Claude just generated, maybe something I'm casually exploring. I spot issues: a function that could be clearer, error handling that's too generic, a repeated pattern that should be abstracted.

The problem: I'm in discovery mode, not fix mode. I don't want to stop and fix each issue immediately. I want to:

  1. Mark the issue at the exact line while I'm looking at it
  2. Keep browsing without losing flow
  3. Later, batch all issues together and have an LLM fix them all at once

This is where the Code Review Logger comes in. It decouples discovery from fixing.

The Emacs Integration

I built an Emacs mode (code-review-logger.el) that tracks review comments in an org-mode file:

;; While reviewing code in Emacs:
;; SPC r c - Log comment at current line
;; SPC r r - Log comment for selected region
;; SPC r o - Open review log

(defun code-review-log-comment (comment)
  "Log a review comment with file/line tracking"
  (let* ((file (buffer-file-name))
         (line (line-number-at-pos)))
    (code-review-format-entry comment file line "TODO")))
Enter fullscreen mode Exit fullscreen mode

This creates entries in ~/code_review.org:

** TODO [[file:~/repos/memento/src/cli.py::127][cli.py:127]]
   :PROPERTIES:
   :PROJECT: memento
   :TIMESTAMP: [2025-09-30 Mon 14:23]
   :END:
   This error handling is too generic - catch specific exceptions

** TODO [[file:~/repos/memento/src/search.py::89][search.py:89]]
   :PROPERTIES:
   :PROJECT: memento
   :TIMESTAMP: [2025-09-30 Mon 14:25]
   :END:
   Add caching here - search is called repeatedly with same query
Enter fullscreen mode Exit fullscreen mode

The Workflow

  1. Review code in Emacs (with syntax highlighting, jump-to-def, all IDE features)
  2. Mark issues as I find them (SPC r c for quick comment)
  3. Trigger the automated fix process:
   Read code-review-llm-prompt-template and follow it
Enter fullscreen mode Exit fullscreen mode
  1. Claude automatically:
    • Reads ~/code_review.org for all TODO items
    • Fixes each issue in the actual code
    • Runs make test after every change
    • Marks items as DONE only when tests pass
    • Provides a summary of what was fixed

The entire workflow is encoded in a memento note (code-review-llm-prompt-template) that Claude reads. This note contains:

  • The review format specification
  • Priority order (correctness → architecture → security → performance)
  • Testing requirements (always run make test, never leave tests failing)
  • Guidelines for what makes a good vs. bad review
  • The complete fix-and-verify process

Why This Works

Batch processing is more efficient than interactive fixes:

  • Claude sees all issues at once and can plan holistically
  • No back-and-forth during fixing
  • Tests run after every change (not just at the end)
  • Clear audit trail of what was fixed

Emacs integration solves the "review without IDE" problem:

  • I'm in my editor with all my tools
  • Jump to definitions, search references, check blame
  • Clicking org links takes me directly to the code

Structured format means Claude gets precise instructions:

  • Exact file paths (clickable org-mode links)
  • Exact line numbers
  • Context about the issue
  • Project name for multi-repo workflows

Current State: Automated Fix Process

The system is fully automated for the fix workflow. When I have pending reviews, I simply say:

Read code-review-llm-prompt-template and follow it
Enter fullscreen mode Exit fullscreen mode

Claude then:

  • Reads the standardized prompt from memento
  • Processes all TODO items from ~/code_review.org
  • Fixes issues, runs tests, marks items DONE
  • Never leaves the codebase with failing tests

The key insight: encoding the entire workflow in a memento note makes it repeatable and consistent. I don't need to remember the exact prompt or process—it's all documented and ready to execute.

Future improvements:

  1. Auto-trigger on commit: Git hook that checks for pending reviews before allowing commits
  2. Proactive review suggestions: Claude analyzing code during normal sessions and adding items to the review log
  3. Review metrics: Track what types of issues are most common to improve coding patterns

The Diff Workflow: Bringing Changes Back to Emacs

The Problem

Claude makes changes in the terminal. I want to review them in Emacs. How do I bridge that gap?

The Current Solution

Simple but effective:

# Claude generates changes, I run:
git diff > /tmp/review.diff

# In Emacs:
# Open the diff file
# Use Emacs diff-mode for navigation
# Apply/reject hunks interactively
Enter fullscreen mode Exit fullscreen mode

This works but feels clunky. I'm copying diffs manually, opening files, navigating around.

What I Want

A tighter integration:

  1. Claude signals "I made changes"
  2. Emacs automatically shows the diff in a split window
  3. I review with full IDE context
  4. I approve/reject specific changes
  5. Claude sees my feedback and adjusts

This requires:

  • MCP server for Emacs communication
  • Claude code that can signal "review needed"
  • Emacs mode that listens for review requests
  • Two-way communication (Claude → Emacs → Claude)

I've prototyped pieces of this but nothing production-ready yet.

The Barrier

Building reliable two-way communication between Claude and Emacs is hard:

  • Emacs server needs to be always-on
  • Need protocol for structured messages
  • Need to handle failures gracefully
  • Race conditions when multiple Claudes talk to one Emacs

I'm experimenting with using memento as the message bus:

  • Claude writes "review-needed" note
  • Emacs polls memento for new reviews
  • Emacs writes feedback to memento
  • Claude reads feedback

Clunky but doesn't require real-time communication.

What Didn't Work: Session Auto-Resume

The Idea

When I restart my computer, I lose all tmux sessions. What if Claude could auto-resume?

# Before shutdown, save session state:
tmux-save-sessions  # Captures all window/pane layouts

# After restart:
tmux-restore-sessions  # Recreates everything
Enter fullscreen mode Exit fullscreen mode

Each session would:

  • Restore to the correct directory
  • Read the last prompt from history
  • Show a summary: "You were working on memento refactoring"

Why It Failed

Context loss is too severe. Even if I restore the directory and prompt, Claude doesn't remember:

  • What code was already written
  • What decisions were made
  • What tests were run
  • What bugs were found

I'd need to capture and replay the entire conversation, which means:

  • Huge token usage (replaying thousands of tokens)
  • Slow startup (processing all that history)
  • Potential for Claude to make different decisions on replay

The Lesson

Session continuity requires more than just state restoration. You need:

  • Explicit checkpoints (memento notes with "current status")
  • Clear handoff documents ("Session ended here, next steps are...")
  • Project-specific context (not just conversation history)

Instead of auto-resume, I now use explicit handoff notes:

# Session Checkpoint: 2025-09-30 14:30

## What We Did
- Refactored CLI argument parsing to use argparse
- All tests pass
- Committed changes: git log -1

## What's Next
- [ ] Add JSON output support to all commands
- [ ] Update documentation
- [ ] Add integration tests

## Key Decisions
- Using argparse instead of manual parsing for consistency
- All commands must support --json flag

## Files Modified
- src/cli.py (lines 1-89, 127-145)
- src/parser.py (new file)
Enter fullscreen mode Exit fullscreen mode

Next session reads this note and picks up where we left off. Works better than trying to resume the conversation.

Experiments in Progress

1. MCP Coordination Server

Building an MCP server specifically for coordinating parallel LLM sessions:

# Hypothetical API
coordinator.claim_file("src/parser.py", session="A")
coordinator.add_barrier("refactor-complete", required=["A", "B"])
coordinator.wait_for_barrier("refactor-complete")
coordinator.get_session_status("A")  # → "in_progress" | "blocked" | "completed"
Enter fullscreen mode Exit fullscreen mode

This would solve the "stepping on each other" problem when running parallel sessions.

2. Telemetry Mining

I have months of telemetry data (see Part 2). Now I want to mine it:

# Which prompts lead to longest sessions?
# Which projects have the most rework?
# When do I context-switch most?
# Correlation between session length and memory usage?
Enter fullscreen mode Exit fullscreen mode

The goal: optimize my workflow based on data, not intuition.

3. LLM-Generated Architecture Docs

After a major refactor, can Claude generate architecture documentation automatically?

Read all files in src/. Generate an architecture document explaining:
- Key components and their responsibilities
- Data flow through the system
- API boundaries
- Design decisions and trade-offs
Enter fullscreen mode Exit fullscreen mode

Early experiments are promising. The docs aren't perfect but are good starting points.

Key Learnings

Embrace obsolescence. If a tool becomes unnecessary, that's progress.

Perfect is the enemy of done. The code review logger works even though it's not fully automated. Ship it.

Tight integration is hard. Two-way communication between tools (Claude ↔ Emacs) requires careful design.

Explicit beats implicit. Session handoff notes work better than trying to auto-resume from history.

Data reveals patterns. Telemetry showed me I context-switch too often—now I batch similar tasks.

What's Next

Part 5 (final article) covers using Claude as a learning tool: generating flashcards, creating annotated worksheets, and building a spaced-repetition system for technical concepts.


The code review logger is in the memento repo. The project ingester is at ~/bin/project-ingest. The tmux session tools are in my dotfiles. All MIT licensed—use freely.

Top comments (0)