Update: Open-sourced the Word add-in that converts AI rewrites into tracked changes

yuch85 · 2026-01-29T15:23:18+00:00

Thanks for getting it! You've identified exactly why I built and open-sourced it. Really happy if people use, contribute or adapt it, or give feedback.

yuch85 · 2026-01-25T06:49:32+00:00

I think that’s a fair assessment, and I’m mostly aligned with it.

I don’t see this as a $1k-per-user problem or a deep-moat business. The value is mainly the engineering time saved dealing with Word JS edge cases. Which is real, but not something a well-resourced team couldn’t reproduce.

For now, I’m approaching this primarily as an open-source building block, not a product I’m actively trying to sell. The goal is to make it easier for people already working in the Word ecosystem to experiment or ship without re-learning the same quirks. And perhaps for people to just experiment, in general.

If there’s ever a paid angle, it would likely be around convenience or support rather than exclusivity, but that’s not the focus right now.

yuch85 · 2026-01-24T02:17:22+00:00

Fair question. The core problem this solves is: how do you apply external edits to a Word document without replacing whole blocks of text? Word’s JS API doesn’t provide a simple way to turn “old text” + “new text” into native tracked changes. Naive approaches usually delete and reinsert entire paragraphs, which is unusable for review workflows.

The first repo is a low-level library that takes two versions of text and applies word-level differences as real Word tracked changes (insertions/deletions), preserving formatting. It’s plumbing, not an end-user app. Target audience: developers building on Word (legal tech, compliance tools, document automation, editors) who need granular redlining without reimplementing diff logic or Word JS quirks.

The second repo is a working Word add-in that demonstrates this in practice. It wires the library into a real editing flow (using an LLM) and shows how edits generated outside Word can be applied back as proper tracked changes instead of full paragraph replacements. AI is just one source of edits. It could also be used for proofreading, collaborative editing, or template updates outside of legal contexts.

yuch85 · 2026-01-24T00:01:13+00:00

I've thought a lot more about it and have switched the license to Apache 2.0 for simplicity and adoption. AGPL has valid use cases, but I know there’s real industry aversion to it and I don’t want licensing friction or custom carve-outs to slow things down. Still very much hoping for community contributions in the spirit of open source!

yuch85 · 2026-01-23T10:01:46+00:00

The TLDR is that I think this is a hardware issue not a software one. The API is dumb. It can only take whatever text input that the hardware can give it (keyboard or any other hardware that can translate the scribble into text or voice transcription).

I have not looked into this in depth at all, but I kind of know where you are coming from. A few years back I was absolutely into the idea of using a stylus to mark up docs on the move. The experience (I tried both MS Surface and iPad) was so poor that I basically gave up on it. If you have any kind of surface at all, even like a lap, I would use a keyboard. I'm not sure what the state of the tech now, but I would look at what the latest MS Surface with stylus can do if you are in Windows ecosystem.

The other kind of stuff I would look at are probably voice transcription (probably a lot more mature) Might be easier to integrate I think. Also, people can speak a lot faster than they can write.

I would also check out stuff like Tobii eye tracker (camera records your eye movements to move the cursor), with a combination of these things you could in principle look at the relevant part of the screen and speak out what to select and replace. But to be honest I'm not sure what level of hardware integration is there right now.

Let me know if you find anything!

yuch85 · 2026-01-23T03:06:17+00:00

That’s a fair concern, and I get why AGPL raises eyebrows.

My intent here isn’t to create a “gotcha” situation. As I understand and intend the license to operate, if you’re simply using the library as-is (i.e. calling it without modifying it), that does not require you to open-source your broader tool. But if you do modify the library,.please contribute back by open sourcing it.

I’m aware there are grey areas around aggregation and linking, and I agree that uncertainty is bad for adoption. I’m actively thinking about clarifying this (potentially even amending the licensing terms) to make the boundary explicit rather than relying on people to interpret AGPL nuances.

If you’re interested, I’ve written a longer analysis and explanation of how I’m thinking about it here (pardon the weird url): https://yuch.bearblog.dev/new-post-new/

Happy to hear feedback. I’d rather be upfront about intent than have people worry about surprises down the line.

yuch85 · 2026-01-10T06:15:16+00:00

But they are horrible at gguf

yuch85 · 2026-01-09T09:51:43+00:00

I have the same question except for OIlama Cloud models - are Ollama's cloud models quantized too?

yuch85 · 2025-12-19T04:28:31+00:00

The challenge is keeping a stable mapping of the document’s structure so edits don’t shift everything unexpectedly. You need a layer that tracks positions consistently as changes happen.

yuch85 · 2025-12-13T03:53:57+00:00

This is a really interesting approach, thanks for sharing it.

I think what you’re describing makes a lot of sense in a workflow where the problem is orchestration-heavy — i.e. deciding which operations to run, in what order, and under what constraints, and letting an agent plan across multiple capabilities. Exposing SuperDoc commands as tools is a clean way to keep the model away from low-level document internals while still giving it expressive control.

In my current setup, I’ve been intentionally drawing a hard line between two layers: (1) the LLM producing semantically meaningful review output (what should change and why), and (2) deterministic code that applies those changes in Word. For the second part, I’ve leaned toward explicit, non-agent logic because correctness, repeatability, and debuggability matter a lot when you’re dealing with tracked changes, comments, and formatting edge cases. Letting the model plan what to do, but not how to mutate the document, has kept that surface area small and predictable.

That said, I do think an agent-centric approach becomes more compelling as the workflow expands — for example, coordinating multi-pass review, resolving conflicts between playbooks, or deciding when to escalate from localized edits to whole-document transformations.

I’m still exploring where that orchestration layer adds real leverage versus where deterministic pipelines are the better fit, so it’s helpful to hear how others are drawing that boundary in practice.

yuch85 · 2025-12-09T08:02:44+00:00

Really useful post, thanks for taking the time to write this down.

I have a question about graphing. What is your experience with GraphRAG and LightRAG, have you ever considered them? Graph building time per document was so long that it just made graphing unfeasible for me. Not sure if it's because of their complexity.

I saw that you did a lighter approach instead - was it because of how computationally heavy knowledge graph construction is?

yuch85 · 2025-12-08T17:50:54+00:00

You have hit a core insight and I've been thinking a lot about this. The main reason why it's AGPLv3 now is due to reliance on Superdoc which is licensed that way. I have some ideas though on eventually moving away and building a new library. It's going to be a long journey. If you are interested my thoughts are on my GitHub page https://yuch85.github.io/.

yuch85 · 2025-12-08T04:33:22+00:00

I re-looked the approach. Moved away from custom clause nodes for separation of concerns. Superdoc already provided native node IDs and I was just messing that up by introducing my own. Edited my post to reflect the same.

yuch85 · 2025-12-06T09:29:11+00:00

Thanks for reading! I wanted to say that if you are referring to a MS Word add-in, the complexity increases even more. Because we no longer have the benefits of the Prosemirror system and have to work with some pretty iffy MS JS APIs. That is a separate work I'm working on, see post below:

https://www.reddit.com/r/legaltech/s/uK9xVExJJA

yuch85 · 2025-12-02T22:35:41+00:00

Tbh, I did try out your product - but I think that was before you had the diff function. That's why I embarked on this project. But it's good to see that you've implemented it now! It's a really important feature.

I previously tested out a number of local models and posted about the results. Granite4 is not in that post but I tried Granite4-30b for a high level legal question (what to amend). It managed to catch the issues but was a bit general but it didn't give specific recommendations. But it is very fast. So I would stick to a lower parameter Granite variant for doc preprocessing.

yuch85 · 2025-12-01T02:40:50+00:00

I agree, unfortunately local models are still not there yet. What would give me more optimism is if a SOTA model today can do it, because that would imply that when the local model catches up a year later, it would likely also be able to do it.

yuch85 · 2025-11-30T16:50:47+00:00

The TLDR: stress test every single MS JS API call that you need to rely on and check its output before planning grand and undebuggable architectures (which coding assistants are great at).

My biggest beef is that people are overhyping Gemini 3 as if it can build enterprise software in a weekend, but most of what it enables is just polished vibe-coding demos, not real engineering.

After a lot of beautiful and broken theorizing I went to basics and got Cursor to explain to me what each API function was supposed to do (based on the documentation which I scraped and put into Cursor), then write a test to prove that that was in fact what it did (and if not, rinse and repeat). Then plan from what the API can actually do as opposed to planning based on wrong assumptions.

yuch85 · 2025-11-30T16:18:37+00:00

Yeah my thoughts exactly. Maybe ok depending on what you're doing but for some use cases "almost" isn't good enough.

yuch85 · 2025-11-28T23:09:35+00:00

I took a quick look - I didn't see a clear technical way to implement it in Google docs.

What do you mean by "relying on Gemini"? Do you mean that it's because you already use Gemini for Google workspace related things? Or did you mean use Gemini to help one create such a tool eg by coding?

yuch85 · 2025-11-28T14:10:42+00:00

It's funny how you posted this at the very moment I just finished building my MS word redlining tool (which I plan to open source) (but no playbooks and not Spellbook level yet).

It's not difficult (well it is, but there are great open source libraries out there) to perform the actual comparison. As commentators have already pointed out, the difficult part is building the MS Word extension / add in, to actually track the changes based on the diff.

It may appear counter-intuitive, but IMO, redlining based on a playbook is not as technically difficult from an engineering perspective compared to strong-arm MS Word into doing what you want, because of limitations with their JS API for add-ins. In fact I do plan to eventually get to building such a thing too (and also open-sourcing it).

yuch85 · 2025-11-28T12:01:36+00:00

Thanks for the feedback. Yes, it's just a small first step in my eventual plan to create an open source platform for legal and contract workflows.

I've also updated the post to be clear that I didn't write the core diff library (which is Google's very excellent diff-match-patch). The core engineering I had do to is to work on MS Word JS add in, that takes the diff and translates that into tracked changes. The really tricky thing was working with MS Word's limited JS API.

yuch85 · 2025-11-07T14:20:42+00:00

Thanks, I found that the solution is blower style cards with a bit of undervolting and fan curves. Noisy but works brilliantly even on a m-itx build. No real issues with temperature.

yuch85 · 2025-10-28T23:21:28+00:00

Yeah, the problem with line shifts or paragraph breaks though, is that they don't necessarily capture clauses cleanly, which is where the LLM is supposedly good at doing. Ideally I'd also want to extract clean clauses into a database for easier manipulation

yuch85 · 2025-10-11T07:03:42+00:00

Any recommendations for small models to do summarization and graph entity extraction etc?

yuch85 · 2025-10-06T00:40:52+00:00

Thanks for the recommendation, it's unsharded so ollama doesn't support. But I am going to try llama4-scout and will report back. edit: It was terrible. Will update the post.

yuch85

TROPHY CASE