DEV Community

Rohan Sharma
Rohan Sharma

Posted on

I built a semantic diff that understands functions, not just lines

 git diff shows you lines. But when you're reviewing code, you think in functions, classes, and methods.

I built https://github.com/Ataraxy-Labs/sem, a CLI that uses tree-sitter to break source code into semantically meaningful chunks and diff them as individual entities.

What it looks like?

On a recent commit that added Dart language support, git diff showed x lines of changes across lock files and source code. sem diff showed this:

crates/sem-core/src/parser/plugins/code/entity_extractor.rs

∆ function find_name_byte_range [modified]
∆ function visit_node [modified]
⊖ function extract_name [deleted]
⊕ function walk_dart_class_member [added]
⊕ function map_class_member_type [added]

5 entities changed. That's what a reviewer actually needs to know.

Impact analysis

The part I find most useful: point it at any function and it shows everything that depends on it, transitively, across the whole repo.

$ sem impact visit_node

→ depends on: 13 functions
← depended on by: extract_entities, extract_ocaml_named_bindings
! 2 entities transitively affected

Before you refactor something, you know exactly what's downstream.

Commands

  • sem diff - entity-level diff with word-level highlights
  • sem entities - list all entities in a file with line ranges
  • sem impact - show what breaks if an entity changes
  • sem blame - git blame at the entity level
  • sem log - track how an entity evolved over time
  • sem context - token-budgeted context for LLMs

Supports 20+ languages (Rust, Python, TypeScript, Go, Java, C, C++, Ruby, Swift, Kotlin, and more). Written in Rust. Open source.

GitHub: https://github.com/Ataraxy-Labs/sem

Top comments (0)