I built a semantic diff that understands functions, not just lines

#opensource #rust #git #programming

git diff shows you lines. But when you're reviewing code, you think in functions, classes, and methods.

I built https://github.com/Ataraxy-Labs/sem, a CLI that uses tree-sitter to break source code into semantically meaningful chunks and diff them as individual entities.

What it looks like?

On a recent commit that added Dart language support, git diff showed x lines of changes across lock files and source code. sem diff showed this:

crates/sem-core/src/parser/plugins/code/entity_extractor.rs

∆ function find_name_byte_range [modified]
∆ function visit_node [modified]
⊖ function extract_name [deleted]
⊕ function walk_dart_class_member [added]
⊕ function map_class_member_type [added]

5 entities changed. That's what a reviewer actually needs to know.

Impact analysis

The part I find most useful: point it at any function and it shows everything that depends on it, transitively, across the whole repo.

$ sem impact visit_node

→ depends on: 13 functions
← depended on by: extract_entities, extract_ocaml_named_bindings
! 2 entities transitively affected

Before you refactor something, you know exactly what's downstream.

Commands