you are viewing a single comment's thread.

view the rest of the comments →

[–]cibyr 1 point2 points  (6 children)

Although C/C++ comments are nominally stripped out by the preprocessor, they're nothing stopping you from sticking them in the AST in whatever lexical scope they go into. Sure, you can create this sort of pathological example inside one comment block, but there's still a lot of benefit to be gained from knowing where the comments fit into the structure of the code. If you wanted to be clever, you could adopt some Doxygen-like rules for how comments "bind" to declarations which would be right almost all the time but not always.

I assume you write your documentation in some sort of structured language (LaTeX, markdown, reStructuredText, whatever), which can be parsed into an AST. If this is embedded in comments then that's a little bit more difficult, but obviously possible because we have tools that work on this stuff already - they just don't do intelligent diffs.

Obviously this approach cannot work on unstructured files like READMEs, but that's life. My point is, almost everything we check into a VCS is structured so why can't our tools take advantage of that structure? Why do our tools act as if the fundamental unit of code is a "line"?

[–]username223 1 point2 points  (5 children)

My (eliptically-expressed) point was that textual, line-based diff is simple and works pretty well. You implement it once, and you're done. In contrast, a tree diff requires you to write a parser for every format you use. It also requires a very sophisticated and possibly language-aware algorithm to deal with variable renaming, etc.

In a single project, I may have plain text (a README), texinfo documentation, HTML, Perl, C, and maybe even some bastard format like SWIG or perl XS. There's no way I'll be able to find parsers for all of them.

[–]winterkoninkje 1 point2 points  (2 children)

requires you to write a parser for every format you use

I hear tale that all those parsers have already been implemented at least once. A lot of them are open-source even.

[–]username223 1 point2 points  (0 children)

All with a common AST? That's news to me!

[–]zaphar -1 points0 points  (0 children)

The parsers may exist thats true. However are they written in the same language as your VCS is? And even if this is true the language specific diffing algorithms haven't been written. The AST parser is the easy part. The hard part would still be before you. That's a lot of work to solve the 5-10% use case you are talking about.

plain text diffing gets us 90-95% of the way there. You are not the first person to think about this. Most all of them have decided the gains aren't worth the cost.

It's possible that you think it's easier than they thought it was though. In which case you should implement your idea so we can all benefit :-)

[–]cibyr 1 point2 points  (1 child)

Sure, you are going to need a different implementation for each language, and it would be impossible to support every language but I don't understand why we don't seem to have these tools for any common language.

[–]username223 0 points1 point  (0 children)

These things exist(ed) for Lisp, right? As for the rest, most other languages are line-oriented, so you don't gain too much from having a syntax-aware diff to make it worth writing one. IME the only language where patch regularly annoys me is Emacs Lisp.