Rohan Sharma

Posted on Feb 9

Git's Line-Based Merge is Broken for the AI Agent Era

#git #ai #opensource #devtools

Git's merge algorithm was designed in 2005 for a world where humans worked on separate files. It compares lines. That worked fine for two decades. But now we have AI agents making parallel changes to the same codebase, and the cracks are showing.

I ran a benchmark: 31 merge scenarios where two branches make independent, non-conflicting changes to the same file. Git produced false conflicts on 16 of them. That's a 48% clean merge rate on changes that should all resolve automatically.

The fix turned out to be simple: parse the code with tree-sitter, merge at the function/class level instead of the line level.

How Git's 3-way merge actually works

Git merge uses the diff3 algorithm. Given three versions of a file (base, ours, theirs), it:

Computes a line-by-line diff from base to ours
Computes a line-by-line diff from base to theirs
Tries to apply both diffs to the base
If both diffs touch the same line range, it declares a conflict

The key word is line range. Git doesn't know what a function is, what a class is, or where one logical unit ends and another begins. It sees a flat sequence of text lines.

Where this breaks

Consider this base file:

export function validateToken(token: string): boolean {
    return token.length > 0;
}

Branch A adds a function below:

export function validateToken(token: string): boolean {
    return token.length > 0;
}

export function formatDate(date: Date): string {
    return date.toISOString().split('T')[0];
}

Branch B also adds a function below:

export function validateToken(token: string): boolean {
    return token.length > 0;
}

export function hashPassword(password: string): string {
    return crypto.createHash('sha256').update(password).digest('hex');
}

These changes are completely independent. formatDate and hashPassword have nothing to do with each other. But both branches inserted new lines at the same position (after line 3), so Git sees overlapping line ranges and produces:

<<<<<<< HEAD
export function formatDate(date: Date): string {
    return date.toISOString().split('T')[0];
}
=======
export function hashPassword(password: string): string {
    return crypto.createHash('sha256').update(password).digest('hex');
}
>>>>>>> feature-branch

A human has to look at this, understand that both functions should exist, and manually combine them. In a real codebase with agents making dozens of parallel changes, these false conflicts pile up fast.

The fix: merge at the entity level

The insight is straightforward: code has structure. Functions, classes, interfaces, JSON keys, YAML mappings. If you parse the code into semantic entities and merge at that granularity, independent changes to different entities can never conflict.

Here's the algorithm:

Step 1: Parse all three versions into entities

Using tree-sitter, extract every top-level entity from each version:

Base: [validateToken]
Ours: [validateToken, formatDate]
Theirs: [validateToken, hashPassword]

Each entity has an identity (name + type + scope) and content (the full source text).

Step 2: Match entities across versions

Build a mapping: which entity in base corresponds to which entity in ours and theirs? Match by identity. New entities (present in ours/theirs but not base) are additions.

validateToken: present in all three, unchanged in both branches
formatDate: added by ours only
hashPassword: added by theirs only

Step 3: Resolve each entity independently

For each entity, apply simple rules:

Base	Ours	Theirs	Result
Same	Same	Same	Keep (no change)
Same	Modified	Same	Take ours
Same	Same	Modified	Take theirs
Same	Modified	Modified	Conflict (if different)
-	Added	-	Include from ours
-	-	Added	Include from theirs
Present	Deleted	Same	Delete

For the example: validateToken is unchanged (keep it), formatDate is added by ours (include it), hashPassword is added by theirs (include it). Clean merge. No conflict.

Step 4: Reconstruct the file

Reassemble from the resolved entities, preserving the ours-side ordering with theirs-only additions inserted after their nearest neighbor:

export function validateToken(token: string): boolean {
    return token.length > 0;
}

export function formatDate(date: Date): string {
    return date.toISOString().split('T')[0];
}

export function hashPassword(password: string): string {
    return crypto.createHash('sha256').update(password).digest('hex');
}

Both functions included, no conflict markers, no human intervention needed.

Handling the hard cases

Same entity modified by both branches

When both branches change the same function, entity-level merge can't always auto-resolve. But it still does better than Git. Instead of "lines 42-47 conflict," you get "function process was modified by both branches." Much more useful context.

You can also attempt an intra-entity merge: run line-level diff3 on just the body of the conflicting function. If the changes are to different lines within the function, this resolves it. If they touch the same lines, you get a conflict scoped to that single function rather than a blob of the whole file.

Imports

Imports sit between entities and need special handling. We treat them as a set: collect all imports from both branches, deduplicate, sort. This handles the common case where both branches add different imports.

Nested entities (classes with methods)

Classes contain methods. We handle this by treating the class as a single entity and doing a recursive inner merge: match methods by name, resolve each independently. Branch A adds methodX and Branch B adds methodY to the same class? Clean merge.

Language support and fallback

We support 15 languages via tree-sitter grammars (TypeScript, JavaScript, Python, Go, Rust, Java, C, C++, Ruby, C#, JSON, YAML, TOML, Markdown, TSX). For unsupported file types or binary files, we fall back to Git's default line-level merge. No worse than before.

The benchmark

31 merge scenarios covering common cases: both sides add different functions, one modifies while the other adds, both modify different functions, JSON key additions, import additions, etc.

Driver	Clean merges	False conflicts
Entity-level (weave)	31/31 (100%)	0
Git (line-based)	15/31 (48%)	16

All 16 false conflicts from Git are cases where independent changes happen to be in the same line range. Entity-level merge resolves every single one.

Why this matters now

This problem existed before AI agents, but it was manageable. Humans naturally avoid editing the same file at the same time, and when conflicts happen, they resolve them during PR review.

With AI agents working in parallel, the math changes. If you have 3 agents making changes to a codebase, and 2 of them touch the same file (adding different functions), you get a false conflict every time. Scale that to 10 agents and it becomes the bottleneck. The agents can write code in seconds, but a human has to stop and resolve conflicts that shouldn't exist.

Entity-level merge is a drop-in fix. Configure it as a Git merge driver, and git merge, git rebase, git pull all use it automatically. No workflow changes, no new commands to learn.

Limitations

Ordering is heuristic. When both branches add new entities, the output ordering is a best guess. For most code this doesn't matter (function order in a file is arbitrary), but for ordered structures like arrays, it could produce unexpected results.

Tree-sitter grammar quality varies. We validate the merged output by re-parsing it and checking that the AST is valid. If validation fails, we fall back to line-level merge.

No cross-file awareness. The merge happens file by file. If Branch A renames a function and Branch B adds a call to the old name, entity-level merge won't catch that. This is the same limitation Git has, just at a different granularity.

Try it

I built this as https://github.com/ataraxy-labs/weave, a Rust merge driver that plugs into Git. MIT licensed.

brew install ataraxy-labs/tap/weave
cd your-repo && weave setup

After that, Git uses it transparently. If you want to preview what it would do: weave preview feature-branch does a dry run.

You can also run the benchmark yourself:

weave bench

I'd love to hear about edge cases that this approach would struggle with. The entity-matching heuristic works well for named entities but has trouble with anonymous closures or unnamed blocks. What scenarios would you throw at it?

Ataraxy-Labs / weave

Entity-level semantic merge driver for Git. Resolves conflicts that git can't by understanding code structure via tree-sitter. 31/31 clean merges vs git's 15/31.

Resolves merge conflicts that Git can't by understanding code structure via tree-sitter

The Problem

Git merges by comparing lines. When two branches both add code to the same file — even to completely different functions — Git sees overlapping line ranges and declares a conflict:

<<<<<<< HEAD
export function validateToken(token: string): boolean {
    return token.length > 0 && token.startsWith("sk-");
}
=======
export function formatDate(date: Date): string {
    return date.toISOString().split('T')[0];
}
>>>>>>> feature-branch

These are completely independent changes. There's no real conflict. But someone has to manually resolve it anyway.

This happens constantly when multiple AI agents work on the same codebase. Agent A adds a function, Agent B adds a different function to the same file, and Git halts everything for a human to intervene.

How Weave Fixes This

Weave replaces Git's line-based merge with entity-level merge. Instead of diffing lines, it:

Parses all three versions (base…

View on GitHub

DEV Community