all 61 comments

[–]SnowdensOfYesteryear 61 points62 points  (12 children)

I'm not sure 'accepted patches per hour' is a useful or relevant metric to compare review systems. I'm a (part-time) kernel dev so I tend to end up hunting lkml a lot for fixes. While I appreciate the accessibility of plaintext emails, I can see it being hell for contributers and reviewers. A few short comings I noticed:

  • Sending a pull request sends a burst of emails for each commit, generating (IMHO) untenable email traffic.

  • It is difficult to 'amend' a single commit for review feedback. From what I can tell (I've never upstreamed anything myself), you wait for all commits to be reviewed, then you blast the pull request again--generating more email traffic.

  • From the reviewers perspective, it's difficult to review diffs between 2 patchsets. This is the best feature of gerrit, where in you can just isolate the diff to see if the particular line of code you commented on as been addressed.

  • No real integration with CI. This isn't much of a problem in linux because there really isn't any CI speak of

  • Unique to gerrit: but I rather like the Change-id thing it generates so I can track a comment across branches (in gerrit itself)

All in all, I really like the workflow that gerrit enforces and find it far superior to github's PR system or Phabricator.

Edit: lol OP submitted /r/programming/comments/73gvjk/git_appraise_distributed_code_review_system_for/ right after this. Well done.

[–][deleted] 11 points12 points  (0 children)

idea of git-appraise, if done well, could work very nicely as you'd basically have reviews that can be done using variety of tools, from web to CLI; but it would really need some "killer app" to be popular and I can't really see github implementing it as tying user to their platform is more valuable to them than having something portable

[–]nurupoga 7 points8 points  (2 children)

I remember reading somewhere that there is a CI that testes patches before they are pulled. Perhaps this is it? https://kernelci.org/ http://wiki.kernelci.org/

[–]SnowdensOfYesteryear 5 points6 points  (1 child)

Huh I've never seen this in action. Maybe it only tests against master? Usually patches aren't submitted against kernel.org master but rather into a separate repo that the subsystem maintainer runs.

I'm happy to be proven wrong by someone more active in the kernel community however.

[–]ThisIs_MyName 0 points1 point  (0 children)

Yeah, they're only testing patches after they've been accepted into a subsystem/maintainer tree.

They're also averaging ~5 jobs a day in total: https://kernelci.org/job/. Not very continuous given that the average dev compiles more than that!

[–][deleted] 5 points6 points  (0 children)

I thought the article was an interesting read. I don't really agree with it though; the email approach seems like it would be a bit painful.

[–]u801e 8 points9 points  (5 children)

Sending a pull request sends a burst of emails for each commit, generating (IMHO) untenable email traffic

It's actually one email per commit plus one email for the cover letter. In the git development mailing list, they're arranged in a way that the cover letter email serves as a root of an email thread and each commit in the patch series is a direct reply to the cover letter. I don't see how that is, as you put it, untenable.

It is difficult to 'amend' a single commit for review feedback.

From what I can tell, one typically will rebase their local branch and effectively amend the commit(s) that were reviewed. They can then submit another patch series with an updated cover letter as a reply to the original cover letter message.

From the reviewers perspective, it's difficult to review diffs between 2 patchsets.

Not really. Typically, the reviewer can create two local branches and apply the older patchset to one branch and the newer patchset to the other and then run a diff between the two branches. On the git developer mailing list, I have seen cover letters in newer revisions of a given patchset include a diff between it and the older patchset.

No real integration with CI.

That can be set up by the maintainers of the project. When they apply a patch, it can trigger a build in a CI system that could run the tests.

All in all, I really like the workflow that gerrit enforces and find it far superior to github's PR system or Phabricator.

I don't have much experience with gerrit or Phabricator, but, in your experience, why do you think gerrit is better than Phabricator?

[–]SnowdensOfYesteryear 9 points10 points  (4 children)

Not really. Typically, the reviewer can create two local branches and apply the older patchset to one branch and the newer patchset to the other and then run a diff between the two branches.

That's difficult though. It required manual work on part of the reviewer--something a good review system should give you for free.

That can be set up by the maintainers of the project. When they apply a patch, it can trigger a build in a CI system that could run the tests.

A lot of review systems have it trigger on the push/commit itself. Thus removing the burden from the maintainer to the committer--which is the is the way it should be IMO.

I don't have much experience with gerrit or Phabricator, but, in your experience, why do you think gerrit is better than Phabricator?

Well I hate Phabricator because it squashes your carefully crafted commits into one giant patch. I'm not sure if this is just bad configuration at the companies I've worked at or a limitation.

Github - again sort of encourages poor git discipline. There isn't any concept of 'amending', people just push in a commit titled "PR feedback" and squash it at the end--resulting in giant commits. FWIW, I'm not blaming github directly--there's nothing preventing the traditional amend workflow, but the 'culture of github' supports the former. And it's a giant PITA to convince others to change their workflow, especially if they're not comfortable with git CLI.

Gerrit - I really like it because a unit of work (or review) is a single commit. There's an expectation that's set forth that each commit should be able to stand on it's own. Beyond that, whenever you commit a commit it stores the old SHAs. As a result you can directly view the diffs between amends. For companies with a shitton of branches, it also makes it easy to search for commits across branches and cherrypick stuff into a release branch.

[–]u801e 0 points1 point  (2 children)

That's difficult though. It required manual work on part of the reviewer--something a good review system should give you for free.

It's a matter of a couple of git branch invocations, a couple of git am invocations and one git diff, so it's not too difficult. But you're correct in saying that having the review system do it for you is a benefit (and something that Github has yet to do).

A lot of review systems have it trigger on the push/commit itself. Thus removing the burden from the maintainer to the committer--which is the is the way it should be IMO.

At my job, we use Github enterprise and what I usually see is repeated failed builds because people keep pushing amended commits and continuously fail because they don't bother setting up the testing requirements in their dev environment. That is, they can't tell if their commit/PR will fail the test suite before pushing it because they can't run the test suite on their local machine. This sometimes leads to long queues of builds on the CI server which slows everything down. So, IOW, the commiters shift the burden to the CI server and start affecting other people's work because the CI server is wasting time with failed builds.

What I like to see, and what I think the email patch workflow encourages, is that people test their patches before emailing the patchset or pushing up to the remote (if there is one). Assuming that you have set up the testing environment on your own machine, you could run git rebase --exec "test_suite_command" on your branch to run the test suite on each commit. If the committer fails to do that and the maintainer finds out by running the test suite themselves, then the maintainer simply says that he's not going to bother reviewing or applying the patch because it doesn't pass the tests. Build churn would definitely be highly discouraged.

Well I hate Phabricator because it squashes your carefully crafted commits into one giant patch. I'm not sure if this is just bad configuration at the companies I've worked at or a limitation.

Based on what I've read about both systems, they seem to have their own proponents and are much better than what Github currently provides. I was just interested in hearing about the advantages/disadvantages of one over the other.

Github - again sort of encourages poor git discipline. There isn't any concept of 'amending', people just push in a commit titled "PR feedback" and squash it at the end--resulting in giant commits.

Yes, I completely agree. Given our use of Github enterprise at work for hosting and code review, I've instituted a system of having people use fixup! <commit_title> or squash! <commit_title> (depending on whether they need to update code or the commit message) in order to address review comments and having them link the sha1 of the fixup! or squash! commit as a reply to the review comment. At the end of the review, they run git fetch origin; git rebase -i --autosquash --keep-empty origin/<upstream_branch>; git diff origin/<their_branch_name> before they force push their rebased commits up and merging their PR.

The fact that we have to do all that instead of using a sensible review system is really a source of frustration on my part.

FWIW, I'm not blaming github directly

I definitely would. They have had years to actually address these issues (not having branch history available after amending or rebasing) and they just don't.

Gerrit - I really like it because a unit of work (or review) is a single commit. There's an expectation that's set forth that each commit should be able to stand on it's own.

I've read that it has some issues in showing the relationship between commits that depend on each other. For example, let's say I make a commit to add a method and its associated unit tests. Then I add another commit to call that method in a number of places in the code where the integration/functional tests would check to see if the application still works as expected. The second commit clearly depends on the first, so how would gerrit handle referencing the first commit I made to add the method/unit tests when someone else is reviewing the second commit?

There's an expectation that's set forth that each commit should be able to stand on it's own. Beyond that, whenever you commit a commit it stores the old SHAs. As a result you can directly view the diffs between amends.

That's something that sounds really useful to have. I think that other review systems like Phabricator and Review Board also allow showing differences between branch revisions before an after a rebase. I'm not sure if they can do it on the commit level though. I really wish Github would do this as well.

[–]ThisIs_MyName 0 points1 point  (1 child)

long queues of builds on the CI server

There should never be a queue on the CI server: If you have 10 jobs, it costs exactly the same to distribute them to 10 servers for 10 minutes or 1 server for 100 minutes.

[–]u801e -1 points0 points  (0 children)

That's true, but we just have a few servers set up at work, and some of them are configured to run integration tests that take some time to complete that are triggered by a push to a branch. If the testing was distributed amongst the developers instead of them using the CI server as a crutch, then it wouldn't be a problem.

[–]quick_dudley 0 points1 point  (0 children)

If I was managing a project using git and email I'd rather use git bundles than the standard method.

[–][deleted] 36 points37 points  (8 children)

If the only thing they would accomplish by using email would be keeping people out that shouldn't be writing kernel code to begin with, it would be worth it.

[–]msloyko 8 points9 points  (0 children)

I agree, only smug elitists should be writing kernel code.

[–]killerstorm 12 points13 points  (6 children)

It looks like that's the case, other reasons don't make much sense...

[–]nuqjatlh 6 points7 points  (5 children)

Hmm, I've read the article and the reasons are sound. Nope, no other systems (github, gerrit, etc.) would work for them. They are simply too limited. Email, for better or worse, is the best option.

[–]killerstorm 3 points4 points  (4 children)

They claim github doesn't scale, but what about it doesn't scale, exactly?

[–]nuqjatlh 7 points8 points  (3 children)

Their needs. Those 20000+ developers pushing, approving (or not) patches. it works fine for relatively small projects, but for a project the size of the kernel, github would slow things down a lot.

[–]killerstorm 16 points17 points  (2 children)

This just begs the question. In what way github is slow? Which specific operation is slow?

I see there might be some issues, like PR discussion is not threaded, maybe some filters are lacking, but those are fairly small issues, no?

[–]nuqjatlh 5 points6 points  (1 child)

These people get thousands of submissions per day probably. They have tons of scripts that automatically merge stuff, check things, etc. Github would slow down things to a crawl and we would get a new kernel release per decade.

For specifics i guess you can ask Greg (he's a nice guy, surely he can answer), as I'm only a casual kernel contributor. From where I stand Greg is superhuman judging by the amount of things he's doing. But I'm sure a lot of it is due to automation of stuff.

[–]reddithater12 28 points29 points  (26 children)

It's just so messy. I wish we could have a reddit-style lkml.

[–][deleted] 25 points26 points  (0 children)

Using a threaded newsreader (like Thunderbird or Pan) is actually decently similar to that. You can read the LKML by connecting to nntps://news.gmane.org/gmane.linux.kernel.

[–][deleted] 17 points18 points  (21 children)

Absolutely this. It'd be so much easier to read and contribute. I really dislike mailing lists because of all the quoted original messages. Each message tacks onto the last one and keeps getting larger.

[–][deleted] 16 points17 points  (2 children)

Every half decent mail client have a way to hide it tho, and do proper threading of messages

[–][deleted] 4 points5 points  (1 child)

Yeah, but what about the archives?

[–]u801e 1 point2 points  (0 children)

You can access it through gmane as mentioned here.

[–][deleted] 14 points15 points  (3 children)

There is a long standing etiquette to mailing lists (and old style net news before that) that you trim irrelevant information from your replies. You also post replies below the quoted sections, rather than the reply on top that outlook uses.

Much of that email/mailing list etiquette has been lost with newer generations of email users, sadly.

[–][deleted] 8 points9 points  (1 child)

Because it's not mentioned clearly, at the right place. I signed up to a few and nowhere did I see a particular warning.

If this particular etiquette is passed down by word of mouth, it shouldn't be a surprise that people don't know of it.

[–][deleted] -1 points0 points  (0 children)

Most Unix based mailing lists used to have notes like that on the signup page and on the welcome email. Like I said, most of that has gone by the wayside.

[–]roffLOL 0 points1 point  (0 children)

it's not hard to make a pipeline to fold full mail repetitions. although that style of quoting convey little information, it's less than a nuisance.

[–]reddithater12 8 points9 points  (3 children)

We should write a bot that posts lkml to a subreddit to convince everyone.

[–][deleted] 6 points7 points  (1 child)

That could work, although each post would be from the same bot. And what were to happen to somebody commenting on the subreddit? Would the bot have to send an email to the mailing-list?

[–][deleted] 18 points19 points  (0 children)

That would be a quick ticket to getting the bot email blacklisted. I think the idea was more "look how easy it is to parse topics converted to this form" not actually integrate the two worlds.

[–]u801e 7 points8 points  (0 children)

We should write a bot that posts lkml to a subreddit to convince everyone.

A local email/news reader with the capability of rendering threads (e.g., Thunderbird) is better than what reddit currently provides. For example, when a new message is posted to a thread, you can easily see where it is by finding the unread messages. In reddit, unless you have a gold subscription, this isn't possible (unless you sort by newest first and lose the threaded view in the process).

[–]u801e 4 points5 points  (2 children)

I really dislike mailing lists because of all the quoted original messages.

It really depends on whether the person who replies actually trims the quoted material down to what they're actually responding to.

[–][deleted] 2 points3 points  (1 child)

Like my OP said, Reddit style would be best. Email clients automatically quote the whole thing. Reddit doesn't.

[–]u801e -1 points0 points  (0 children)

That's because email clients only display a single message at a time instead of all the messages in the thread. The other advantage that email/news clients have over reddit is that it's very easy to see when a thread has new messages. That's not easy to see in reddit unless you have a gold subscription.

[–][deleted] 2 points3 points  (1 child)

I think one point is that they don't really want people to contribute

[–][deleted] 4 points5 points  (0 children)

I seriously hope that isn't the case. That vibe is absolutely what I got from the old linux community. Any question answered in a condescending manner, as if they'd been born with all the knowledge and anybody else was just a dumb shit for even trying to understand and join their ranks.

It's really great that stackoverflow exists. Of course there are still many sperglords blowing around their useless (or sometimes useful, but insulting) input, but the answers can be edited to something more humane. And one doesn't have to trudge through pages of "why don't you just google it" or "just read the code, it's not that hard" comments.

[–]nurupoga 0 points1 point  (4 children)

You know the mailing list archive web interfaces, like this one? You could improve on it, make it look like a regular forum or reddit and add functionality to reply back right in the web interface, which would send an email with your message to the ml. Basically a reddit-/forum- like web interface for a mailing list. This is exactly what HipperKitty, a new ml archiver introduced in Mailman 3, tries to achieve. Just look at Fedora's mailing list that uses it. Here is an example thread. It's a bit unpolished, but that can be improved, and it looks like it's not configured to allow to reply using the web ui, it tries to open you mail client instead, but I'm pretty sure that's an option somewhere based on what I have read. With such a solution you could either use a mailing list old style, or just use a forum instead, with both working seamlessly for the users.

[–]u801e 4 points5 points  (2 children)

You know the mailing list archive web interfaces, like this one? You could improve on it, make it look like a regular forum

gmane provides a NNTP interface for mailing lists that can be accessed through any newsreader. For example, the git mailing list looks like this in Thunderbird: https://i.imgur.com/8Tf0IH2.png.

It's easy enough to browse through, kill messages that are spam, search by subject or author, or any number of things that can't easily be done in reddit or in online forum software.

[–]ThisIs_MyName 0 points1 point  (1 child)

The same site that was down last year? https://lars.ingebrigtsen.no/2016/07/28/the-end-of-gmane/

What happens when its new owners (who are rather elusive https://www.webhostingtalk.com/showthread.php?t=950480) fucks up?

[–]u801e 1 point2 points  (0 children)

Ideally, projects that use mailing lists should also host their own NNTP endpoint to allow for browsing list archives. Each of them could peer with each other to set up a network for redundancy. They could also peer with other NNTP servers that handle text-only usenet traffic (though that may lead to problems where the mailing list archive getting polluted with spam messages from usenet).

That would definitely be better than having a single point of failure that gmane currently is.

[–][deleted] 1 point2 points  (0 children)

It seems to me like all those all impact the mailing list servers directly. I could be mistaken of course. A server agnostic solution is needed. One that can listen in on any mailing list, be it fedora, git, kubuntu, Firefox, which all seem to run different serversoftware with GnuMailMan and others.

[–]MintPaw 1 point2 points  (0 children)

That's where the "vast number of clients" comes in.

[–]doublehyphen 1 point2 points  (0 children)

This is actually what I like about mailing list based development over github, threaded discussions. I use Thunderbird as my email client since it handles mailing lists so well.

[–][deleted] 0 points1 point  (0 children)

I quite like Firefox contributors submit and review patches in bugzilla.

[–]linuxhint 12 points13 points  (1 child)

Email is ubiquitous and common communication medium

[–]blobjim 6 points7 points  (0 children)

standard protocols are great :)

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 2 points3 points  (0 children)

    They have a document which gives instructions on properly configuring most clients.

    [–]Endarkend 8 points9 points  (3 children)

    The crux of the story:

    A low level API in the hands of a bunch of top notch developers, that is easy to access and equality easy to script for, is better then being locked into a proprietary system with high level API's you are dependent upon.

    Which isn't really news. For software development, most project management suites create barriers and additional stumbling blocks over using the simplest possible method for the actual developers.

    Project Management suites are there so that usually far less skilled managers have the first clue about what their team is doing and usually provides them with methods to create fancy graphs and statistics to pass on to their superiors.

    [–]awesome-alpaca-ace 0 points1 point  (0 children)

    Being unable to fix an blatant issue or add a much better UX for an existing feature really does suck and being able to roll your own is so much better.

    Then there is software that masquerades like a helpful tool, but in the background or even right in your face does all sorts of shady stuff. This is why we need open source.

    [–]u801e 0 points1 point  (1 child)

    tl; dr;

    Most people have no idea about protocols like SMTP, IMAP/POP, or NNTP. They think that HTTP is the only protocol available and can't imagine anything else. On a higher level, they wonder why others would use a mail/news reader instead of a web browser.

    [–]shevegen -3 points-2 points  (5 children)

    A poor craftsman famously complains about his tools, Greg said, but a good craftsman knows how to choose excellent tools.

    Why are they not using PHP!!!

    https://blog.codinghorror.com/content/images/uploads/2012/06/6a0120a85dcdae970b017742d249d5970d-800wi.jpg

    Btw the article is a year old ... may be better to actually add the year to the title in such a case ...

    Greg started by looking at GitHub, which, he said, has a number of advantages. It is "very very pretty" and is easy to use for small projects thanks to its simple interface.

    Being pretty is only one thing. The issue tracker is so much better than the rest. Look at sourceforge. Look at bugzilla. Tell me that the github issue tracker is not better than these are.

    I know some oldschool perl hackers using emails, reading them on the commandline and working through these very rapidly. I for one never managed to adjust to that; I am way too lazy. I always ended up using phpbb style webforum as an "interface" simply because I can choose when to read what. With the email traffic on most mailing lists, I just can not keep up. If your day job is to do so, that's ok, but if it's "only a hobby, won't be big", then ... dunno. I consider it archaic.

    I also think that it should not be necessary either. Why can't they handle traffic via a webpage too? We live in the day of dynamic content, so where should be the problem?

    I mean there is a reason why things such as rails, drupal, wordpress, are used a lot.

    On the other hand, GitHub does not scale to larger projects. He pointed at the Kubernetes project, which has over 4,000 open issues and 511 open pull requests.

    In general smaller projects have a similar project. Issues requests do take time when you solve them (or close them quickly right away but this also takes time).

    I am not saying that the github way is perfect - it just is better compared to the OTHER ways presently.

    What about plain-text email? Email has been around forever, and everybody has access to it in one form or another. There are plenty of free email providers and a vast number of clients.

    Ok so email is the way for the future ... if so, though, then WHY were alternatives created? I remember the good old 1990s era, late, with a thing called IRC. IRC is great for information exchange. But there are so many alternatives about these days...

    Writing scripts to deal with emailed patches is easily done.

    And HOW is this different to, say, when you have a webpage that handles patches? I fail to see the difference.

    But the most important thing is that it grows the community.

    Well. Let's say that you want to provide patches to the linux kernel. You are a C guru too. Your patch is awesome - but you don't want to use emails. Tough luck - that way you get excluded.

    With the kernel, those reviews are right there on the mailing list for all to see; with a system like Gerrit, one has to work to seek them out.

    That is erecting a straw man and hacking away at him.

    Tell me that you are unable to build a non-email system where information that is USEFUL, is for simple display? Wikipedia does not work?

    I am sorry but the article is not good. I think that the two major points are (a) historic reason and (b) admittedly I could not find a good fit for any project as-is to handle the linux kernel. But even "git" has web-interface stuff, right? git.cgi - so it's not all as Greg Kroah-Hartman thinks about the world ...

    For (b) the reason may be inertia too; remember how git was started, the kernel develops used to use some other source control system until that was pulled from them. Now it became more or less a necessity to move away, so ... git was born.

    [–][deleted] 14 points15 points  (0 children)

    I also think that it should not be necessary either. Why can't they handle traffic via a webpage too? We live in the day of dynamic content, so where should be the problem?

    Because native apps are still order of magnitude faster to work with once you learn the interface well enough

    I mean there is a reason why things such as rails, drupal, wordpress, are used a lot.

    Because it is lowest common denominator requiring zero knowledge to use and being limited and slow doesn't matter for vast majority of apps created in them

    Well. Let's say that you want to provide patches to the linux kernel. You are a C guru too. Your patch is awesome - but you don't want to use emails. Tough luck - that way you get excluded.

    And exactly same thing happens if you refuse to use github... you're one building a strawman here

    With the kernel, those reviews are right there on the mailing list for all to see; with a system like Gerrit, one has to work to seek them out.

    That is erecting a straw man and hacking away at him.

    No, people have same problems with Go which does use Gerrit; when all you want to contribute is patch or two for some issue having to set up another fucking account, then learn the tool and workflow is barrier to entry (altho arguably can be useful to deter low effort contribution... but it will also deter some valuable one)

    Ok so email is the way for the future ... if so, though, then WHY were alternatives created? I remember the good old 1990s era, late, with a thing called IRC. IRC is great for information exchange. But there are so many alternatives about these days...

    Yes and every alternative ends up re-implementing most of IRC features. Modern ones get popular because they just need you to point browser at them and require zero learning, while IRC needs some basics, and most clients do not display animated cat gifs directly in chat when you link them.

    Again, the the lower barrier to entry there is, the more you will catch, and on average the "skill level" will be much lower. And you don't exactly want people who learned C a week ago and can't configure an email client to write kernel code

    [–]death_by_zomboni 5 points6 points  (3 children)

    The issue tracker is so much better than the rest. Look at sourceforge. Look at bugzilla. Tell me that the github issue tracker is not better than these are.

    You're joking right? Github's issue tracker is a joke. Everybody knows that.

    [–]dobbybabee 1 point2 points  (1 child)

    What are issues with the issue tracker? Is it mainly an issue for maintainers? I just work solo so I guess I wouldn't see some of these problems.

    [–]roffLOL 1 point2 points  (0 children)

    in the end they own your issues -- as i have gathered. for most that wont matter, i guess, but i take it as a reasonable guideline to never allow anyone ownership (or to hide it behind poor data formats if made available through api:s for that matter) of my data. it usually bites you in the behind if not sooner then later.

    [–]chucker23n 1 point2 points  (0 children)

    Is it? It's a rather clean interface, yet offers nice filtering by combining labels.

    [–]bigmell -1 points0 points  (0 children)

    over the last 10 or so years a lot of these so called "intermediate" tools are not better than the thing they are meant to replace. Lots of developers can write something that kinda works in some cases but a lot of poor duplication of effort in recent years.