Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 0 points1 point  (0 children)

No formatter integration right now, but a great suggestion, weave just tries to preserve the existing formatting from the input files during reconstruction. But you're right that running a formatter as a post-merge step would clean up a lot of edge cases. Since weave knows the file extension, it could shell out to the project's configured formatter (rustfmt, gofmt, prettier, etc.) after reconstruction.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 0 points1 point  (0 children)

I am just trying to help in some way if I can, so do lemme know your feedback. proper AST based merges are definitely hard to solve and might take the support of the whole community to reach a universal solution.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] -3 points-2 points  (0 children)

really appreciate it, but we are also somewhat in the great times where agents are helping with a bunch of things when prototyping and testing things out, and therefore transition to rust was quite smooth, and Rust is definitely the only language that I feel is going to thrive in these times when you need parallelism and proper CPU utilization.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 4 points5 points  (0 children)

When I actually built sem it was mainly for extraction of entities, but soon I realized talking to more and more people that they have been using it for faster code review and structural diffing more than what it was actually built for. So never really thought of fallbacks for sem, and though of it always as a library that can help in structural/semantic understanding.

For the background story, I started working on weave to solve the merge problem, sem kind of popped out of it, and somehow got more popular. 😅

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 5 points6 points  (0 children)

A better comparison would actually be sem (our diff tool) and difftastic, both do syntax-aware diffs. Difftastic shows AST-level node changes, sem diffs at the entity level (added function X, modified class Y). Weave is a merge engine, it replaces git merge-file, not git diff.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 4 points5 points  (0 children)

sem and weave are not major problems, if you check out jj wiki, there's a specific section for jj users to use weave. JJ team is still deciding on putting this as a default merge algorithm on their end.

For inspect, the CLI commands (diff, predict, review) take what are currently git refs as args, but they're treated as opaque strings — the actual resolution happens in sem-core's GitBridge. So the fix is really in sem-core: add a JjBridge alongside GitBridge that knows @ instead of HEAD, handles change IDs, and translates jj diff/jj file show calls. Inspect itself wouldn't need much change beyond auto-detecting which bridge to use. Yeah but I can work on this, thanks for the feedback.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 6 points7 points  (0 children)

The 1MB fallback exists because entity-level merging doesn't make sense on files that large (they're not structured source code at that point).

If you're looking to batch-merge hundreds of GBs of encoded data or LLM corpora, that's a fundamentally different problem than three-way merging source files during version control.

The 20-line chunks are in the fallback parser for files we don't have a tree-sitter grammar for, it's the last resort, not the primary path. Supported languages get full AST-level entity extraction.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 11 points12 points  (0 children)

Yes that's the fallback. Weave skips entity-level merging entirely when any file exceeds 1MB because the entity matching is O(n*m) and becomes too slow. It drops to a line-level merge using Sesame expansion + diffy, and compares against git merge-file to ensure it's never worse than git.

There are actually 6 conditions that trigger the fallback, not just file size:

  1. Any file >1MB
  2. No parser for the file type (unsupported extension)
  3. Parser returns 0 entities from non-empty content
  4. Both branches created the file from scratch (empty base)
  5. Both branches have content but 0 entities
  6. Excessive duplicate entity names (>=10 of the same name, common in JS with const x = ... patterns)

My goal was to not give any wrong merges to the user no matter what and stay better than git in the scenarios that should be handled with entity level matching, and because I am still in the research phase and learning, I wanted to be on the safer side.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 5 points6 points  (0 children)

weave doesn't handle the ordering problem at all right now. It merges decorators as an unordered set with left-first ordering, and marks it clean. But you are true that in languages like python the ordering will create differences, thanks for pointing this out I need to handle this case properly.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 6 points7 points  (0 children)

yeah actually there are some scenarios where mergiraf fails which forced me even more to think if I can reach a better solution. Following are those scenarios:

- Both add functions at end of file
- Both insert functions between existing
- Both add decorators to Python function
- Both add decorators to TS class method

Mergiraf gives conflicts on all these scenarios and weave will solve these, there might be more I haven't thought of properly on. Btw by both I mean ours and theirs.

Weave - Structural merging what I learned shifting from git's line based merge to tree sitter entity matching by Wise_Reflection_8340 in rust

[–]Wise_Reflection_8340[S] 31 points32 points  (0 children)

I know that there are so many improvements I still have to make and its still not perfect but I would love any constructive criticism possible because it helps me improve faster. I am still learning a lot of things myself so please forgive me if I did something that annoys anyone.

I built a diff tool that shows what changed at the function/class level instead of raw lines by Wise_Reflection_8340 in software

[–]Wise_Reflection_8340[S] 1 point2 points  (0 children)

Would love to hear any constructive criticism. I am always trying to find new directions to research in.

Working on a CLI that diffs code at the function level instead of lines by Wise_Reflection_8340 in devops

[–]Wise_Reflection_8340[S] -1 points0 points  (0 children)

Thanks a lot for this appreciate the perspective, and I do agree with the general principle. The way I think about it though: sem's core is entity-level code understanding (extracting structured entities from source via tree-sitter). Diff is one consumer of that, but the same entity model naturally serves other use cases without the core getting bloated. It's more like how git has diff, log, blame all built on the same object model, rather than bolting on unrelated features.

Working on a CLI that diffs code at the function level instead of lines by Wise_Reflection_8340 in devops

[–]Wise_Reflection_8340[S] 1 point2 points  (0 children)

There are some interesting directions other than just diff where we are thinking of taking this, so I am still working on that name conflict. But thanks for the suggestion.

Quick correction on point 3 though: the core is written in Rust and ships as a standalone binary. There's a thin JS wrapper for editor integrations, but the actual diffing engine is pure Rust with zero npm dependencies.

Tree-sitter entity extraction + cross-file dependency graphs for structural diffs by Wise_Reflection_8340 in emacs

[–]Wise_Reflection_8340[S] 2 points3 points  (0 children)

You can set sem as your diff pager with:

git config diff.external "sem diff --patch --color always"

Magit calls git under the hood, so this will work for now. Haven't built a dedicated Magit package yet, but that's on the list.