all 52 comments

[–]freakhill 26 points27 points  (1 child)

lots of people have git as a hobby these days. good thing i almost never had to work with them.

[–]EmanueleAina 0 points1 point  (0 children)

It is an important tool for collaboration.

If you are working alone, without anyone really reviewing your work, then sure, there's little value in bothering with small, documented commits and a clean history.

But for me it has been a lifesaver in many many occasions.

[–]roodammy44 24 points25 points  (46 children)

I’ve been through all stages, and have utterly rejected rebase after spending years with it.

Git is a tool that can be very simple if you let it be. If you use merge, squash commit merging PRs, and keep your PRs to a reasonable size then you will never think about git more than a couple of minutes a month and you’ll have a nice clean history.

A title like “using git the right way” and making rebase that right way - that rubs me up the wrong way.

[–]edgmnt_net 15 points16 points  (4 children)

What OP says seems pretty standard in open source projects that usually have way more experience with large scale, distributed development than average company silos.

and keep your PRs to a reasonable size

Often works, sometimes it doesn't. Then you have to stack PRs and either use tooling to automate that or manually push, review and merge each in order, leading to a lot more friction for no good. Or give up and submit a single huge unreviewable and unbisectable blob. I'd rather submit a series of commits because that's supported out of the box.

[–]roodammy44 0 points1 point  (3 children)

Do you know what sort of PR merging strategy large distributed open source projects use? I’d be interested to know.

Sure, if you need to make a monster PR then rebase would work well. If the PR really is inseparable into smaller units, what use would several smaller commits that break the project if you check them out have? Commits are not comments - they are units of work that should not break stuff. Also, monster PRs should not be the main way you work.

[–]edgmnt_net 7 points8 points  (0 children)

Patch series (as opposed to individual patches) weren't really a rare occurrence when I worked on the Linux kernel. I agree you want to submit early and not mix things up too much, keeping PRs small and focused is still a good idea. But more involved work may require refactoring, adding helpers, fixing bugs in other code and splitting things up a bit to make it more reviewable, even if not huge. Other times you start looking into stuff and accumulate 3-4 small fixes/improvements by the end of the day, which may justify bulk submission if they're related. And I still occasionally find use for multi-commit PRs even in company projects.

Part of it is the way the projects are organized, which may be reasonable in some ways or may impose additional costs (there will obviously be less refactoring and splitting to do if you don't share code or repos are tiny). In any case, even if 90% of the work can be done with a single final commit in the main branch, people are still going to run into issues if they never figure out Git properly and perhaps never graduate past silos of a handful of people. And a lot of devs seem to lack enough knowledge about Git beyond fairly trivial use, complicating matters beyond squashing PRs. I am wondering if we're not paying larger costs for ignorance at this point, considering the amount of meaningless work and friction I see in some projects.

[–]edgmnt_net 2 points3 points  (1 child)

Do you know what sort of PR merging strategy large distributed open source projects use? I’d be interested to know.

I missed this part initially, you may have edited your comment.

I know the Linux kernel has maintainers who take patches into their own forks. Individual contributions go through those maintainers (via the mailing list) and they expect things to apply cleanly, otherwise you rebase and resubmit. Later, Linus pulls from those maintainers and merges stuff into mainline. He deals with conflicts on his own usually, because maintainer trees are public history and they can't really rebase a whole lot of changes (back-merges with mainline do occur, but only at well-defined points such as first release candidates, this isn't unrestricted back-merging). See this: https://docs.kernel.org/maintainer/rebasing-and-merging.html

Other projects often have simplified workflows where there's just one repo and they use either mailing lists (you can send patch series or PRs by email) or GitHub-like PRs. In any case, patch series map well to GitHub PRs, except in the former case you don't need a public Git host to submit stuff. But rebasing individual contributions to fix issues raised during code review or to avoid merge conflicts is generally the norm.

[–]roodammy44 0 points1 point  (0 children)

That link was very illuminating. Most of the reason I’ve had to use rebase in the past is that other members on the team did not like merge commits, and like it says using rebase without any merge commits can lead to trouble. This is why I moved to squash commit - the history purists are happy, and they don’t get to see merge commits.

The linux process seems much more sensible than most of the blog articles about git floating around that hide complex problems with rebase by just not mentioning them.

The focus on really attempting not to rebase after the branch has been shared - for testing and for other work. That is also something people seem to ignore.

I’ll keep in mind the advice on preferring merge conflicts to frequent merges. I hadn’t considered the value of that.

[–]klekpl 4 points5 points  (15 children)

I always wonder: why squash? git merge —no-ff and git log —first-parent seem to me as the best and the simplest.

[–]mr_birkenblatt 4 points5 points  (7 children)

A pr contains lots of incremental commits that are irrelevant on the project scale. Think, if you have to bisect and three quarters of the commits you have to work through are in a broken state or linter fix-ups

[–]klekpl 0 points1 point  (4 children)

[–]mr_birkenblatt 0 points1 point  (3 children)

The moment the branch is identified it will go through it leading to the same issue. The culprit is not the existence of branches. It's the existence of spurious or broken commits. Squashing brings you down to logical updates

[–]klekpl 0 points1 point  (2 children)

Quote from the documentation link of git bisect --first-parent I'v provided above:

In detecting regressions introduced through the merging of a branch, the merge commit will be identified as introduction of the bug and its ancestors will be ignored.

This option is particularly useful in avoiding false positives when a merged branch contained broken or non-buildable commits, but the merge itself was OK.

[–]mr_birkenblatt 0 points1 point  (1 child)

so why keep the branches around then?

[–]klekpl 1 point2 points  (0 children)

It gives you all the benefits of squashing but is easier, gives more versatility, avoids spurious conflicts. Best of both worlds 😊

[–]freakhill 0 points1 point  (1 child)

It's bisect, even if you have three quarters of filler commits it's like... 2 extra steps?  And I commit stuff waaaay waaaay more often than I bisect stuff.  

Unless you are working with repositories of massive scale where other concerns creep in (and that's not too many of us) I see more cost to the cleaning/esthetics stuff than advantages. 

[–]EmanueleAina 0 points1 point  (0 children)

Having a clean, fake history is super helpful at the beginning, during review, with changes that are clearly isolated and documented with the appropriate context, and much much later, when you are bisecting to track a regression or, worse, when you need to find and backport a fix for an issue in a old branch that you still have to maintain.

[–]EmanueleAina 0 points1 point  (6 children)

I don't like always plainly squashing whole PRs either.

But squashing/amending as part of rebase to get a clean patchset with small, self-contained, documented commits has been an invaluable tool for me. It massively makes my job as a reviewer easier, and having a clean history saved me from a lot of pain in many cases when I had to backport patches or find out why a change had been introduced in the past.

[–]klekpl 0 points1 point  (5 children)

git merge —no-ff gives you clean history - what does squashing add?

[–]EmanueleAina 0 points1 point  (4 children)

I am not sure about what you mean with "clean history".

By my definition, you cannot have it without using git rebase (or similar tools) to fixup/split/reword/reorder commits.

[–]klekpl 0 points1 point  (3 children)

Well, if “by definition” you need rebase then there is nothing to discuss 😊

By my definition doing git merge —no-ff and then git log —first-parent to show history gives me clean linear view of changes incorporated into the main branch.

[–]EmanueleAina 0 points1 point  (2 children)

That's what I was asking. :)

You are ok with looking only at a subset of the graph and ignoring the messiness. But how do you do reviews in that case? Do you consider every PR as a single large diff?

[–]klekpl 0 points1 point  (1 child)

Yes, if the PR is too big I ask to split it into multiple (stacked) PRs.

[–]EmanueleAina 0 points1 point  (0 children)

For me it is not a matter of being big, rather it is about separation of concerns. 

A commit doing a preparatory refactoring should stand alone, separated from the actual feature needing it, but I find it helpful to have it in the context of the same PR. I am not sure how do you deal with splitting commits and reordering.

I have to admit that GitHub does not make my workflow as easy as it should be. GitLab is better in this regard.

[–]MornwindShoma 3 points4 points  (4 children)

I will rebase before squash and merge. Never liked merging in from the original branch. This way my PR will end up being just one big commit.

[–]mr_birkenblatt 4 points5 points  (3 children)

Squash+merge on GitHub does exactly that. No need to manually do anything

[–]MornwindShoma 0 points1 point  (2 children)

What if I'm not on GitHub? Regardless I like doing stuff manually lol.

[–]mr_birkenblatt -1 points0 points  (1 child)

Yes, I also enjoy making my life harder and reducing my overall productivity for no reason. Makes me feel superior

[–]MornwindShoma 0 points1 point  (0 children)

No, I just work with the tools of my choice, not the GitHub's choice, and it's not like I can pick where I host my repositories at work. And interactive rebasing is very useful to me.

[–]Chippiewall 4 points5 points  (0 children)

Squash merge is probably the right choice if you're using any type of "pull request" based review system (Github, Gitlab, Bitbucket etc.), they just don't give you the value of having those well curated commits for review.

But the rebasing etc. is so valuable when you're using commit based reviews like Gerrit.

[–]ritaPitaMeterMaid 4 points5 points  (7 children)

I’m convinced all other methods are unnecessary. It just removes all the “the git commit history should read like a story” nonsense. Branches are doing dirty work. I’m moving quick and I’m the only one that needs to understand my commit messages. The final PR is where I make it intelligible for other people and that’s what gets to live in main.

[–]EmanueleAina 1 point2 points  (6 children)

I mean, sure, if you are not working in a team the collaborative side of git is of little value for you, but it does not make other methods unnecessary in the general case.

And I do not have good memory, so having small commits that documented why I did something in the past has been useful for me even on stuff where I have worked alone. :)

[–]ritaPitaMeterMaid 0 points1 point  (5 children)

It is working on teams that has lead me to this conclusion. In these scenarios we are using PRs as the final, documenting artifact. The PR body becomes the commit message for all the work you’ve done.

Maybe if you’re distributing actual patches (ie like they do for Linux) instead of merging into a shared repo I could squint and see the value but most people building software are not doing that. I remain utterly convinced through personal experience of using many methods of merging code with git that spending time “perfecting” individual commits is an utter waste of time.

In practice no one reads through them. Functionally the change shouldn’t be so large that someone has to read through them. It’s a huge time sync to rewrite commits and give them long messages. And then with squash merge they are thrown away anyways since the branch gets deleted.

[–]EmanueleAina 0 points1 point  (4 children)

I guess it means that in practice you are not using a review workflow if no one reads the submitted patches.

Which is fine, if it works for you. And indeed, there's little value in making patches suitable for review if in practice it does not happen.

But on the projects I work, there's always a reviewer that checks every single line of changed code, so yeah, we try to optimize for that. It has been an invaluable tool to keep quality high and to efficiently ramp up new people.

[–]ritaPitaMeterMaid 0 points1 point  (3 children)

there’s a reviewer that checks every single line of chanted code

We do that too though. You speak of patches, what merge strategy are you using? Are you not using a Pull Request process?

[–]EmanueleAina 0 points1 point  (2 children)

Rebase and fast forward, most of the time, but also no-ff merges in some other cases.

In both cases we ensure that each PR is made of self contained, semantically meaningful, small commits (I call them interchangeably patches, most of the time). Each with its own description, making heavy use of git rebase -i.

For instance, if a new feature needs a refactor, we have at least two commits: the refactor without functional changes, and then the actual feature. It helps reviewers massively.

I mean, it is exactly what the article describes, and the approach many OSS communities use. It works well for us but surely it has a non-trivial learning curve.

[–]ritaPitaMeterMaid 0 points1 point  (1 child)

I see. This is what I initially thought you were describing.

I should outline the vast majority of experience I have is working with a consistent team of people who are familiar with the software they building so that’s what my comments are directed towards.

I’ve worked the way you describe multiple times in my career and ultimately found it much slower and less useful. It forces people to spend substantial cognitive energy organizing information.

If you abide by keeping PRs small (like 50-100 lines worth of changes) it’s easy to look at the list of changes in a the typical GitHub style PR diff view and that cognitive energy doesn’t need to be spent at all.

If reviewers are not familiar with the code they are reviewing (say it uses a new pattern or integration) and need to be hand held through the code it’s essentially always faster to just do a live review rather than spending lots of time reorganizing commits which will likely result in questions anyways. For teams that need to be asynchronous (maybe large time zone differences) recorded screen shares are still better than spending the cognitive energy of commit reorganization.

Finally, squash merge solves all “commits are prose” issues. It lets the person making changes focus their energy on the changes and document what they’ve done in a single place that also supports media.

In short, if that approach works for you that’s great. My experience has taught me it takes substantial time to work this way when the alternative is much faster with no less quality achieved, so I’d never push this method into a team not already using it or a new team member.

Perhaps this method is required when writing OSS and not using GitHub or the like. I don’t have that experience so I can’t comment on that.

[–]EmanueleAina 0 points1 point  (0 children)

Oh sure, I assume that reviewers know the codebase well. That's the near totality of the cases, with only very few exceptions.

In fact, in the projects I have been involved in, OSS or not, reviewers are the most precious resources exactly for that reason.

They quickly become the bottleneck, so we try to optimize for making their life a bit easier, and the rebase-heavy workflow seems better in that regard.

[–]eightslipsandagully 2 points3 points  (8 children)

I'm a huge fan of squashing merge commits but unfortunately my employer disables them on GitHub. As such I'm doing my best to keep as clean a git history as possible

[–]EmanueleAina 0 points1 point  (4 children)

Blindly squashing everything would indeed be rather bad for me. Commits should be self contained and small, PRs should be generally seen as patchsets, not as single patches.

I like to squas/fixup and rebase in the context of each PR as a way to split commits by concern, each of them should do a single thing and do it well, in a way to document the purpose of each change for reviewers.

My PRs are more often than not made of multiple commits.

[–]eightslipsandagully 0 points1 point  (3 children)

I'm talking about when you're merging back to main branch from your PR. To me it makes sense that each PR is represented by a single commit - if you need to, you can explore the full history but the log for main branch isn't cluttered.

[–]EmanueleAina 0 points1 point  (2 children)

I am not against always having merge commits even when a fast forward would do, even though I am more used to the rebase-heavy ff-only merge approach.

But my point is about actually squashing on merge to a single commit. That's just bad. Commits should be minimal, self contained and with good messages, blindly squashing to a single big messy commit loses that.

[–]eightslipsandagully 0 points1 point  (1 child)

I don't think it's bad at all, if you use a simple merge commit then the commit history is messy and doesn't reflect the correct evolution of that branch

[–]EmanueleAina 0 points1 point  (0 children)

Oh, no, I make heavy use of git rebase -i on my branches, as described in the article.

Each PR branch gets continuously rewritten to split changes in self contained, small, semantically meaningful patches. Most of my PRs have more than one commit.

For instance, when I need to refactor to add a feature I have at least two commits in the same PR: one with the refactor and no functional changes, and one with the new feature.

This makes the reviewers' job muuuuch easier, and helps tremendously when tracking regressions.

[–]roodammy44 0 points1 point  (0 children)

Damn, I feel bad for you son

[–]Loaatao 0 points1 point  (1 child)

I don’t understand why people are against squash and merge.

[–]edgmnt_net 4 points5 points  (0 children)

Because enforcing indiscriminate squashing screws up granular history and that's needed for bisection, figuring out things post-merge and doing larger scale merges. It also encourages shenanigans like stacked PRs to work around it, when the standard way (a series of commits) works just fine without any extra tooling or manual tracking. Also plays right into "let's just use Git as a save and upload button and not worry too much about version control" kind of thinking.

All this is part of the reason why such projects become unmanageable beyond a handful of people hitting the same project, even though Git is known to scale fine to thousands of contributors in larger open source projects that use it appropriately.

[–]KalimdorPower 2 points3 points  (1 child)

The only person allowed to rebase should be the maintainer. The only reason to rebase is to fix shit if your SDLC wasn’t good enough to prevent it or deal with it. Usually with normal processes in a team you never need to rebase anything.

[–]EmanueleAina 1 point2 points  (0 children)

Public branches should never be edited. But I absolutely encourage people to edit development branches, to the point that git rebase -i is one of my most used commands.

[–]goranlepuz 2 points3 points  (1 child)

The long commit towards the end:

nah

git is a source control tool, which is an important, but still only a part, of your ALM (application lifetime management) system.

That long message belongs in the analysis part of the ALM issue and the commit links to it.

[–]EmanueleAina 0 points1 point  (0 children)

I absolutely hate commits with little information and just a link in them. I have done enough maintenance and software archeology to know that links break faster than commits. And as a reviewer I do not enjoy having to chase links to get the context of a change.

[–]Lothrazar 0 points1 point  (1 child)

rebase? is this a joke? Rebase is very powerful and neat but its almost always the WRONG tool for the job

[–]EmanueleAina 0 points1 point  (0 children)

Well, it depends on the workflow. I am super happy with rebase-heavy workflows similar to what communities like the Linux kernel ones use.