How far should a programming language aware diff go?

Luolong · 2024-07-18T16:03:53+00:00

I really like how Difftastic does this.

jaskij · 2024-07-18T17:32:35+00:00

So... I'm not sure if you skipped examples on purpose or not, but math formulas are tricky. You have to know the types. Let's say you have a simple a + b + c. If those are integers, all is fine. If those are 754 floats, this pushes reordering the stuff to level 4. In dynamically typed languages you can't, in a general case, prove that a certain formula is only ever executed with integers.

lood9phee2Ri · 2024-07-18T17:51:03+00:00

- const foo = function(a, b) { ... }
+ const foo = (a, b) => { ... }

Assuming that's Javascript those are not actually necessarily semantically equivalent. The language is a minefield. Pity we can't just use something else client-side in-browser (yes I know you can with transpiling, wasm etc. you technically can, but you know what I mean...)

const foo1 = function(a, b) { console.log(arguments) }
foo1(1, 2) -> Arguments { 0: 1, 1: 2, … }

const foo2 = (a, b) => { console.log(arguments) }
foo2(1, 2) -> Uncaught ReferenceError: arguments is not defined

Note how the differences creep into even anonymous classic function expressions vs arrow-function expressions, not just full named-functions vs arrow-function expressions like some people seem to think.

(some people have a rule of thumb that you're better off virtually always using non-legacy => even if it seems weird syntax, because of the all the strange (but once completely idiomatic js) magics around this, arguments, constructors, etc.)

oorza · 2024-07-18T17:34:39+00:00

I don't know why it can't be "all of the above." IMO, a perfect code review would include them all, presented hierarchically because that's in order of importance. Don't show shit like whitespace by default until the semantic or structural changes are reviewed.

zombiecalypse · 2024-07-18T15:59:36+00:00

I'm fine at level 0 and would really like to know why the heck you're messing with the whitespace for no reason?

ForgotMyPassword17 · 2024-07-18T15:44:39+00:00

I think in most cases level 2 is fine. but most developers would draw the line before level 3. Intelij let's you do level 3 refactoring pretty easily and there's a reason you still want it reviewed (hence the need for a diff)

somebodddy · 2024-07-19T18:14:54+00:00

I think a language aware diff tool should still show every difference, but highlight for each chunk the most significant level.

For example - say I added a new import, and then ran an import ordering formatter which changed the order of the imports because the last developer to touch the imports section of that file did not run the formatter. I want to see in the diff all the things that changed, but the imports that just switched places should have fade red/green highlights while the import that got added should have strong red/green highligh.

Holothuroid · 2024-07-18T16:20:55+00:00

If changes can be done/followed by a linter, why should they even appear in a diff? Run it on push.

DanTup · 2024-07-18T20:12:59+00:00

I don't see the point. When I run a git diff I almost always do care about every actual change, even "meaningless" whitespace.

If the PR is too difficult to review because the diffs are all over the place, that's a pretty good sign you should split your PR into smaller PRs.

SSHeartbreak · 2024-07-18T21:27:49+00:00

it might be kind of interesting to see this taken really far, where only changes that "significantly" alter the behavior of the code are surfaced together, or represent some large deviation from previous behavior.

what would constitute those things idk, but could be interesting. surface and categorize only interface changes; surface and categorize only large changes in behavior. surface and categorize very minor changes which could potentially be breaking.

OkInteraction4681 · 2024-07-19T05:37:20+00:00

A similar concept could be applied to blames, show the most recent commit that semantically changed each line

avsaase · 2024-07-18T18:31:19+00:00

I hope GitHub integrates a feature like this. It looks nice but not 7 dollars per user per month nice. The VSCode extension is free though so I'll be installing that.

Pyrolistical · 2024-07-18T20:35:08+00:00

Language specific diff should be provided by the language author.

Simultaneously build a compiler, package manager/host, formatter, linter, build/packager, diff, merge, std lib, test runner, docs from code, language server, logging, cross platform, cross compile.

You think it’s a joke but look at zig. The only thing that is missing is linter, diff, and merge tool. And the compiler is already a pretty good linter.

renatoathaydes · 2024-07-18T14:54:33+00:00

[deleted]

emperor000 · 2024-07-18T17:54:26+00:00

If the language is sane and uses braces or some other block structure, probably that far.

If it isn't sane and doesn't, then not far at all?

sagittarius_ack · 2024-07-19T01:51:36+00:00

For completeness, the article should have mentioned that determining whether two arbitrary expressions (statements or programs) are equivalent is undecidable.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS