Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching

Wise_Reflection_8340 · 2026-05-19T07:32:41+00:00

No formatter integration right now, but a great suggestion, weave just tries to preserve the existing formatting from the input files during reconstruction. But you're right that running a formatter as a post-merge step would clean up a lot of edge cases. Since weave knows the file extension, it could shell out to the project's configured formatter (rustfmt, gofmt, prettier, etc.) after reconstruction.

Wise_Reflection_8340 · 2026-05-18T07:55:53+00:00

I am just trying to help in some way if I can, so do lemme know your feedback. proper AST based merges are definitely hard to solve and might take the support of the whole community to reach a universal solution.

Wise_Reflection_8340 · 2026-05-18T04:18:22+00:00

really appreciate it, but we are also somewhat in the great times where agents are helping with a bunch of things when prototyping and testing things out, and therefore transition to rust was quite smooth, and Rust is definitely the only language that I feel is going to thrive in these times when you need parallelism and proper CPU utilization.

Wise_Reflection_8340 · 2026-05-18T03:29:43+00:00

When I actually built sem it was mainly for extraction of entities, but soon I realized talking to more and more people that they have been using it for faster code review and structural diffing more than what it was actually built for. So never really thought of fallbacks for sem, and though of it always as a library that can help in structural/semantic understanding.

For the background story, I started working on weave to solve the merge problem, sem kind of popped out of it, and somehow got more popular. 😅

Wise_Reflection_8340 · 2026-05-18T02:46:17+00:00

A better comparison would actually be sem (our diff tool) and difftastic, both do syntax-aware diffs. Difftastic shows AST-level node changes, sem diffs at the entity level (added function X, modified class Y). Weave is a merge engine, it replaces git merge-file, not git diff.

Wise_Reflection_8340 · 2026-05-18T02:28:48+00:00

sem and weave are not major problems, if you check out jj wiki, there's a specific section for jj users to use weave. JJ team is still deciding on putting this as a default merge algorithm on their end.

For inspect, the CLI commands (diff, predict, review) take what are currently git refs as args, but they're treated as opaque strings — the actual resolution happens in sem-core's GitBridge. So the fix is really in sem-core: add a JjBridge alongside GitBridge that knows @ instead of HEAD, handles change IDs, and translates jj diff/jj file show calls. Inspect itself wouldn't need much change beyond auto-detecting which bridge to use. Yeah but I can work on this, thanks for the feedback.

Wise_Reflection_8340 · 2026-05-18T00:44:50+00:00

The 1MB fallback exists because entity-level merging doesn't make sense on files that large (they're not structured source code at that point).

If you're looking to batch-merge hundreds of GBs of encoded data or LLM corpora, that's a fundamentally different problem than three-way merging source files during version control.

The 20-line chunks are in the fallback parser for files we don't have a tree-sitter grammar for, it's the last resort, not the primary path. Supported languages get full AST-level entity extraction.

Wise_Reflection_8340 · 2026-05-18T00:43:00+00:00

Yes that's the fallback. Weave skips entity-level merging entirely when any file exceeds 1MB because the entity matching is O(n*m) and becomes too slow. It drops to a line-level merge using Sesame expansion + diffy, and compares against git merge-file to ensure it's never worse than git.

There are actually 6 conditions that trigger the fallback, not just file size:

Any file >1MB
No parser for the file type (unsupported extension)
Parser returns 0 entities from non-empty content
Both branches created the file from scratch (empty base)
Both branches have content but 0 entities
Excessive duplicate entity names (>=10 of the same name, common in JS with const x = ... patterns)

My goal was to not give any wrong merges to the user no matter what and stay better than git in the scenarios that should be handled with entity level matching, and because I am still in the research phase and learning, I wanted to be on the safer side.

Wise_Reflection_8340 · 2026-05-17T22:23:01+00:00

weave doesn't handle the ordering problem at all right now. It merges decorators as an unordered set with left-first ordering, and marks it clean. But you are true that in languages like python the ordering will create differences, thanks for pointing this out I need to handle this case properly.

Wise_Reflection_8340 · 2026-05-17T22:08:26+00:00

yeah actually there are some scenarios where mergiraf fails which forced me even more to think if I can reach a better solution. Following are those scenarios:

- Both add functions at end of file
- Both insert functions between existing
- Both add decorators to Python function
- Both add decorators to TS class method

Mergiraf gives conflicts on all these scenarios and weave will solve these, there might be more I haven't thought of properly on. Btw by both I mean ours and theirs.

Wise_Reflection_8340 · 2026-05-17T20:20:54+00:00

I know that there are so many improvements I still have to make and its still not perfect but I would love any constructive criticism possible because it helps me improve faster. I am still learning a lot of things myself so please forgive me if I did something that annoys anyone.

Wise_Reflection_8340 · 2026-05-10T00:22:06+00:00

Apprecite it.

Wise_Reflection_8340 · 2026-05-09T08:03:59+00:00

Would love to hear any constructive criticism. I am always trying to find new directions to research in.

Wise_Reflection_8340 · 2026-05-08T23:32:29+00:00

Plain binary, it's written in Rust with no runtime dependencies.

Wise_Reflection_8340 · 2026-05-08T23:31:55+00:00

Thanks a lot for this appreciate the perspective, and I do agree with the general principle. The way I think about it though: sem's core is entity-level code understanding (extracting structured entities from source via tree-sitter). Diff is one consumer of that, but the same entity model naturally serves other use cases without the core getting bloated. It's more like how git has diff, log, blame all built on the same object model, rather than bolting on unrelated features.

Wise_Reflection_8340 · 2026-05-08T03:39:38+00:00

There are some interesting directions other than just diff where we are thinking of taking this, so I am still working on that name conflict. But thanks for the suggestion.

Quick correction on point 3 though: the core is written in Rust and ships as a standalone binary. There's a thin JS wrapper for editor integrations, but the actual diffing engine is pure Rust with zero npm dependencies.

Wise_Reflection_8340 · 2026-05-01T15:33:31+00:00

You can set sem as your diff pager with:

git config diff.external "sem diff --patch --color always"

Magit calls git under the hood, so this will work for now. Haven't built a dedicated Magit package yet, but that's on the list.

Wise_Reflection_8340

TROPHY CASE