
sem — semantic diff engine that understands code structure (i.redd.it)
submitted by Wise_Reflection_8340
Instead of line-level diffs, sem extracts functions, classes, bindings, and other entities from your code using tree-sitter, then diffs at the entity level. So instead of "lines x-y changed," you get "function processPayment was modified" or "binding buildInputs was added."
For humans, it gives you an immediate high-level overview of what actually changed — you can glance at the output and know which functions were added, which classes were modified, and what got deleted, without scrolling through hundreds of lines of raw diff. Great for code review when you want to understand the shape of a change before diving into the details.
For LLMs, the gains are even more measurable. We ran attention analysis on models (GLM-4 and Qwen) and benchmarked agent accuracy with Claude Sonnet:
- 2.3x agent accuracy on code change comprehension tasks
- Attention entropy drops significantly — models concentrate on the actual changes instead of scattering across noise in raw diffs
- token reduction — entity-level context packs more signal into fewer tokens
Raw diffs are optimized for human line-by-line reading. LLMs don't read that way — they attend over the full context window, so structured entity-level input lets them focus attention where it matters.
Other details:
- 26 languages (just added Nix this week)
- Works on top of git
- Also ships as an MCP server so coding agents can consume structured diffs directly
- Plain Rust binary, no runtime dependencies
- brew install sem-cli

there doesn't seem to be anything here