[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

Here is my attempt to summarize the discussion, please see the TBDs.

I would work on a PR with respective contributor and committer guideline updates next.



- Clear history with purpose and origin of changes
- Ability to perform a granular rollback, if necessary
- Efficiency and quality of review (avoid tiny or out-of-context changes or huge mega-changes)
- Efficiency of authoring (don't want to wait on a review for a tiny bit because GitHub makes it very hard to stack up reviews in sequence / don't want to have major changes blocked because of difficulty of review)
- Ease of new contribution ("OK for committers to do more, while new/one-time contributors shouldn't need to know or obey any policy"). TBD: I think that quote needs clarification. IMO contributors are expected to read and adhere to contribution guidelines.

Clean history:

- Commit messages should tag JIRAs and be otherwise descriptive. It should not be necessary to find a merge or first PR commit to find out what caused a change
- We want to squash "Fixup!", "Address comments" type of commits to achieve a clean history
- We prefer that PR authors squash such commits after review is complete. This expectation should be described clearly in the Contributor's Guide.
- The process should not burden committer due to back and forth with author and deal with cleaning up PR history and other cosmetics. We want to reduce committer overhead. But note that we don't want to shift the burden to first time contributors either.
- Committer can use the "squash and merge" option (or modify the PR commits in other ways). This should address the overhead concern. TBD: I would suggest that the author, when this is explicitly not desired, needs to indicate it upfront.
- Committer is ultimately responsible (committer guidelines) and we "trust the committer's judgment"

Granularity of changes:

- We prefer small independent, incremental PRs with descriptive, isolated commits. Each commit is a single clear change.
- It is OK to keep separate commits for different logical pieces of the code, which make reviewing and revisiting code easier
- Making commits isolated is a good practice, PR author should be able to relatively easily split the PR upon reviewer's request
- For multiple commits in a PR, every commit that gets merged should compile and pass tests

Git playbook 

- How to rollback multiple commits from single PR
- Simple step-by-step instructions for authors to cleanup history

On Mon, Oct 1, 2018 at 11:50 AM Rui Wang <ruwang@xxxxxxxxxx> wrote:
+1 to add JIRA issue as the part of commit message. it requires less effort but will help a lot on our commit history. I used to not do that but I will start to add JIRA info to my future commits.


On Mon, Oct 1, 2018 at 2:59 AM Alexey Romanenko <aromanenko.dev@xxxxxxxxx> wrote:
+1 to what Anton said, I’m fully agree with this. 

Looking on the current commit history we can notice that there are many commits like “fix”, “add/edit/remove”, “clean up”, etc, that, seems, were introduced after the review process iterations. I think this makes commit history too much verbose and more difficult to follow. 

Also, most of such commits don’t have Jira issue name as part of commit message. So it requires to find a merge or fist PR commit in case we want to find out what caused this change. 

I believe we can improve our commit guidelines in this way and it should help to have commit history more clean and easy to read. 

On 1 Oct 2018, at 06:34, Kenneth Knowles <kenn@xxxxxxxxxx> wrote:

SGTM. I generally favor "trust the committer's judgment" aka (2)(d) when it is obvious. We had some small communication problem about this before so I just wanted to be careful to ask the author when it is not obvious.


On Sat, Sep 29, 2018 at 9:06 AM Robert Bradshaw <robertwb@xxxxxxxxxx> wrote:
There's two separate topics of discussion going on here. 

(1) Is it acceptable for PRs to have more than one commit. We've discussed this before, and the consensus seems to be that this is both allowed and intentional. 

(2) What to do about PRs that are clearly "[BEAM-NNNN] Implement feature" + (fixup/address comments/lint/...)*. I think this is easier (or at least quicker) to make a call on. Our options are 

(a) Merge as is.
(b) Ask the author to do a final squash. (Doing a squash before an approval, however, makes reviewing more difficult, so this enforces another round.)
(c) Having the committer manually pull, squash, (force push to PR?), and merge. 
(d) Use the "squash and merge" button in this case. 

Clearly, we shouldn't be doing any squashing unless the intent is obvious, but when it is, I think (d) is the sanest and safest route. 

- Robert

On Fri, Sep 28, 2018 at 9:53 PM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
On Fri, Sep 28, 2018 at 10:29 AM Thomas Weise <thw@xxxxxxxxxx> wrote:
+1 for stating the goal of clear provenance and granular rollback.
Also of course efficiency and quality of review (we don't to review tiny or out-of-context changes or huge mega-changes), efficiency of authoring (don't want to wait on a review for a tiny bit because GitHub makes it very hard to stack up reviews in sequence / don't want to have major changes blocked because of difficulty of review), ease of new contribution (OK for committers to do more IMO, while new/one-time contributors shouldn't need to know or obey any policy).

I think this discussion helps to remind/identify best practices how to get there. Where appropriate we should augment guidelines for both, contributor and committer. 

Kenn: would you elaborate on the "1 commit = 1 review (and sometimes even = 1 ticket)" a bit more. Is that a problem of insufficient task / ticket granularity or something else?

The problem isn't a failure to define tasks at the right granularity, but that they naturally and fundamentally exist with a different granularity (unit of change -> unit of review -> unit of tracking/delivery). I'd guess there's often a 5x jump in size from commit -> review -> ticket. There are many many easy-to-isolate changes that are wasteful to independent review or track.

To get some concrete facts, I bet the one things we could probably find research on is the ideal review size. And we could also scrape logs for messages with bullet points (often each bullet is basically what would have been a commit). Generally getting a sense of what these ratios are in practice and idealized would be kind of interesting but maybe overkill.


Charles: my plan is to translate the outcome of this discussion to guideline updates and your log filter trick will be part of it. Though my hope is that it won't be needed if we get closer to the goals above.


On Fri, Sep 28, 2018 at 9:44 AM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
Anton makes a good point. We have been talking about policy for what we do, but the real issue is what we want to come out of it: a clear history for seeing where code came from and granular rollback. I think in both cases the key thing is that each commit is a single clear change. How they get there is not the point.

I have worked on multiple projects with a 1 commit = 1 review (and sometimes even = 1 ticket). These pretty much never have a good history. The best case is that each commit has a message that is a bullet point of many separate changes, because it is simply too inefficient to review each logical change separately. But since the messages become less useful, it encourages a culture of not even bothering to write meaningful messages at all.

Note that is trivial for a committer-reviewer to edit the history any way they like without the button. "Allow edits by maintainers" is on by default. The "Squash and merge" button just adds a button for something we can already do.

Charles: super useful! Worth noting that for a PR with a good history it will skip meaningful commits (but still give the summary line, which is nice).


On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin <kedin@xxxxxxxxxx> wrote:
Is there an actual problem caused by squashing or not squashing the commits that we face in the project? I personally have never needed to revert something complicated that would be problematic either way (and don't have a strong opinion about which way we should do it). From what I see so far in the thread it doesn't look like reverting is a frequent major pain for anyone. Maybe it is exactly because we're mostly following some best practice and it makes it easy. If someone has concrete examples from their experience in the project, please share them, this way it would be easier to justify the choice.

The PR and commit cleanliness, size and isolation are probably more important thing to have guidance and maybe enforcement for. There are well known practices and guidelines that I think we should follow, and I think they will make squashing or not squashing mostly irrelevant. For example, if we accept that commits should have description that actually describes what commit does, then "!fixup", "address comments" and similar should not be part of the history and should be squashed before submitting the PR no matter which way we decide to go in general. Also, I think that making commits isolated is also a good practice, and PR author should be able to relatively easily split the PR upon reviewer's request. And if we choose to keep whole PRs small and incremental with descriptive isolated commits, then there won't be too much difference how many commits there are.


On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
I brought up this discussion a few months ago from the other side: I don't like my commits being squashed. I try to create logical commits that each passes tests and could be broken up into multiple PRs. Keeping those changes intact is useful from a history perspective and squashing may break other PRs I have in flight. If the intent is clear (one commit with a descriptive message and a bunch of "fixups"), then feel free to squash, otherwise ask first. When you do squash, it would be good to leave a comment as to how the author can avoid having their commits squashed in the future.


On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath <chamikara@xxxxxxxxxx> wrote:

On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw <robertwb@xxxxxxxxxx> wrote:
I agree that we should create a good pointer for cleaning up PRs, and request (though not require) that authors do it. It's unfortunate though that squashing during a review makes things difficult to follow, so adds one more round trip. 

We could consider for those PRs that make sense as a single logical commit (most, but not all, of them) simply using the "squash and merge" button even though it technically doesn't create a merge commit. 

+1 for allowing "squash and merge" as an option. Most of the reviews (at least for me) consist of a single valid commit and several additional commits that get piled up during the review process which obviously should not be included in the commit history. Going through another round here just to ask the author to fixup everything is unnecessarily time consuming.

- Cham

On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira <danoliveira@xxxxxxxxxx> wrote:
As a non-committer I think some automated squashing of commits sounds best since it lightens the load of regular contributors, by not having to always remember to squash, and lightens the load of committers so it doesn't take as long to have your PR approved by one.

But for now I think the second best route would be making it PR author's responsibility to squash fixup commits. Having that expectation described clearly in the Contributor's Guide, along with some simple step-by-step instructions for how to do so should be enough. I mainly support this because I've been doing the squashing myself since I saw a thread about it here a few months ago. It's not nearly as huge a burden on me as it probably is for committers who have to merge in many more PRs, it's very easy to learn how to do, and it's one less barrier to having my code merged in.

Of course I wouldn't expect that committers wait for PR authors to squash their fixup commits, but I think leaving a message like "For future pull requests you should squash any small fixup commits, as described here: <link>" should be fine.
I was also thinking about the possibility of wanting to revert 
individual commits from a merge commit. The solution you propose works, 
but only if you want to revert everything.

Does this happen often? I might not have enough context since I'm not a committer, but it seems to me that often the person performing a revert is not the original author of a change and doesn't have the context or time to pick out an individual commit to revert.

On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels <mxm@xxxxxxxxxx> wrote:
I tend to agree with you Lukasz. Of course we should try to follow the
guide lines as much as possible but if it requires an extra back and
forth with the PR author for a cosmetic change, it may not be worth the

On 19.09.18 22:17, Lukasz Cwik wrote:
> I have to say I'm guilty of not following the merge guidelines,
> sometimes doing merges without rebasing/flatten commits.
> I find that it is a few extra mins of my time to fix someones PR history
> if they have more then one logical commit they want to be separate and
> it usually takes days for the PR author to do merging  with the extra
> burden as a committer to keep track of another PR and its state (waiting
> for clean-up) is taxing. I really liked the idea of the mergebot (even
> though it didn't work out in practice) because it could do all the
> policy work on my behalf.
> Anything that reduces my overhead as a committer is useful as for the
> 100s of PRs that I have merged, I've only had to rollback a couple so
> I'm for Charle's suggestion which makes the rollback flow slightly more
> complicated for a significantly easier PR merge workflow.
> On Wed, Sep 19, 2018 at 1:13 PM Charles Chen <ccy@xxxxxxxxxx
> <mailto:ccy@xxxxxxxxxx>> wrote:
>     What I mean is that if you get the first-parent commit using "git
>     log --first-parent", it will incorporate any and all fix up commits
>     so we don't need to worry about missing any.
>     On Wed, Sep 19, 2018, 1:07 PM Maximilian Michels <mxm@xxxxxxxxxx
>     <mailto:mxm@xxxxxxxxxx>> wrote:
>         Generally, +1 for isolated commits which are easy to revert.
>          > I don't think it's actually harder to roll back a set of
>         commits that are merged together.
>         I think Thomas was mainly concerned about "fixup" commits to
>         land in
>         master (as part of a merge). These indeed make reverting commits
>         more
>         difficult because you have to check whether you missed a "fixup".
>          > Ideally every commit should compile and pass tests though, right?
>         That is definitely what we should strive for when doing a merge
>         against
>         master.
>          > Perhaps the bigger issue is that we need better documentation
>         and a playbook on how to do this these common tasks in git.
>         We do actually have basic documentation about this but most
>         people don't
>         read it. For example, the commit message of a Merge commit
>         should be:
>         Merge pull request #xxxx: [BEAM-yyyy] Issue title
>         But most merge commits don't comply with this rule :) See
>         https://beam.apache.org/contribute/committer-guide/#merging-it
>         On 19.09.18 21:34, Reuven Lax wrote:
>          > Ideally every commit should compile and pass tests though, right?
>          >
>          > On Wed, Sep 19, 2018 at 12:15 PM Ankur Goenka
>         <goenka@xxxxxxxxxx <mailto:goenka@xxxxxxxxxx>
>          > <mailto:goenka@xxxxxxxxxx <mailto:goenka@xxxxxxxxxx>>> wrote:
>          >
>          >     I agree with the cleanliness of the Commit history.
>          >     "Fixup!", "Address comments", "Address even more
>         comments" type of
>          >     comments does not convey meaningful information and are
>         not very
>          >     useful. Its a good idea to squash them.
>          >     However, I think its ok to keep separate commits for
>         different
>          >     logical pieces of the code which make reviewing and
>         revisiting code
>          >     easier.
>          >     Example PR: Support X in the pipeline
>          >     Commit 1: Restructuring a bunch of code without any
>         logical change.
>          >     Commit 2: Changing validation logic for pipeline.
>          >     Commit 3: Supporting new field "X" for pipeline.
>          >
>          >     On Wed, Sep 19, 2018 at 11:27 AM Charles Chen
>         <ccy@xxxxxxxxxx <mailto:ccy@xxxxxxxxxx>
>          >     <mailto:ccy@xxxxxxxxxx <mailto:ccy@xxxxxxxxxx>>> wrote:
>          >
>          >         To be concrete, it is very easy to revert a commit in
>         any case:
>          >
>          >          1. First, use "git log --first-parent" to find the
>         first-parent
>          >             commit corresponding to a PR merge (this is a
>         one-to-one
>          >             correspondence).
>          >          2. Use "git revert -m 1 <commitid>" to revert the
>         commit; this
>          >             selects the first parent as the base for a merge
>         commit (in
>          >             the case where a single commit needs to be
>         reverted, just
>          >             use "git revert <commitid>" without the "-m 1" flag).
>          >
>          >         In any case, as a general good engineering practice,
>         I do agree
>          >         that it is highly desirable to have small independent PRs
>          >         instead of large jumbo PRs whenever possible.
>          >
>          >         On Wed, Sep 19, 2018 at 11:20 AM Charles Chen
>         <ccy@xxxxxxxxxx <mailto:ccy@xxxxxxxxxx>
>          >         <mailto:ccy@xxxxxxxxxx <mailto:ccy@xxxxxxxxxx>>> wrote:
>          >
>          >             I don't think it's actually harder to roll back a
>         set of
>          >             commits that are merged together.  Git has the
>         notion of
>          >             first-parent commits (you can see, for example,
>         "git log
>          >             --first-parent", which filters out the intermediate
>          >             commits).  In this sense, PRs still get merged as
>         one unit
>          >             and this is preserved even if intermediate
>         commits are
>          >             kept.  Perhaps the bigger issue is that we need
>         better
>          >             documentation and a playbook on how to do this
>         these common
>          >             tasks in git.
>          >
>          >             On Wed, Sep 19, 2018 at 9:27 AM Thomas Weise
>         <thw@xxxxxxxxxx <mailto:thw@xxxxxxxxxx>
>          >             <mailto:thw@xxxxxxxxxx <mailto:thw@xxxxxxxxxx>>>
>         wrote:
>          >
>          >                 Wanted to bring this up as reminder as well as
>          >                 opportunity to discuss potential changes to our
>          >                 committer guide. It has been a while since
>         last related
>          >                 discussion and we welcomed several new
>         committers since
>          >                 then.
>          >
>          >                 Finishing up pull requests pre-merge:
>          >
>          >
>         https://beam.apache.org/contribute/committer-guide/#finishing-touches
>          >
>          >                 PRs are worked on over time and may
>         accumulate many
>          >                 commits. Sometimes because scope expands,
>         sometimes just
>          >                 to separate independent changes but most of
>         the time the
>          >                 commits are just fixups that are added as
>         review progresses.
>          >                 It is important that the latter get squashed
>         prior to PR
>          >                 merge, as otherwise we lost the ability to
>         roll back
>          >                 changes by reverting a single commit and also
>         generally
>          >                 cause a lot of noise in the commit history
>         that does not
>          >                 help other contributors. To be clear, I refer
>         to the
>          >                 "Fixup!", "Address comments", "Address even more
>          >                 comments" type of entries :)
>          >
>          >                 I would also propose that every commit gets
>         tagged with
>          >                 a JIRA (except those fixups that will be
>         squashed).
>          >                 Having the JIRA and possibly other tags makes
>         it easier
>          >                 for others not involved in the PR to identify
>         changes
>          >                 after they were merged, for example when
>         looking at the
>          >                 revision history or annotated source.
>          >
>          >                 As for other scenarios of jumbo PRs with many
>         commits,
>          >                 there are probably situations where work
>         needs to be
>          >                 broken down into smaller units, making life
>         better for
>          >                 both, contributor and reviewer(s). Ideally,
>         every PR
>          >                 would have only one commit, but that may be a
>         bit much
>          >                 to mandate? Is the general expectation
>         something we need
>          >                 to document more clearly?
>          >
>          >                 Thanks,
>          >                 Thomas
>          >