OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recruiting more maintainers for Apache Arrow


Hi Wes,

to contribute an outsiders POW: while it is clear, what's expected if you'd
like to make a PR, it's not at all clear to me, where would I start if I
wanted to help with PR reviews without being heavily involved with the
community/being a full maintainer. Should I just grab a PR, test it,
comment on changes? I wouldn't be sure if I were stepping on someone's
feet, tbh. So, in my view it would help if:

* there were some kind of informal reviewer assignment system, i.e. I say
"I'd like to review this PR", Wes/Uwe/Antoine reply: "sure, give it a
shot". This would be mentioned prominently in the contributor guide

* afterwards there were some kind of feedback-to-feedback arrangement,
although it would increase the work load for the existing maintainers in
the short term, of course

Cheers,
Dimitri.

On Sun, Jul 1, 2018 at 1:09 AM Donald E. Foss <donald.foss@xxxxxxxxx> wrote:

> For what it's worth, this email thread and your summary writeup, Wes, are
> a significant call to action on their own.
>
> I've been passive, not by choice, but by policy. Given the significance
> and need of this project, I'll see what I can do on my side. It will be at
> least a week given the US holiday.
>
> Donald E. Foss
>
> > On Jun 30, 2018, at 2:15 PM, Marco Neumann <marco@xxxxxxxxxxxxxx.INVALID>
> wrote:
> >
> > Hey,
> >
> > first of all, thanks a lot for your, Uwes, the mergers and contributors
> > work. Now, to the maintainer problem:
> >
> > # Arrow as "a library"
> > One thing that makes Arrow special is that it is not a single, but many
> > libraries (one for each language) and many of them are not only a
> > binding to a C/C++ lib, but partly a complete re-implementation of the
> > protocol, e.g.:
> >
> > - C++: one core, but also contains Python specialties
> > - Java: another core
> > - Rust: yet another core
> > - Python: a binding to C++ but also a lot more stuff because of Pandas
> > ...
> >
> > And you two are maintaining all of them and I doubt that you have the
> > capacities and knowledge to do this at the desired level of quality
> > (which is natural, not a personal issue or offense). So this I would
> > call "pseudo-maintenance", since you're solely the gatekeeper that does
> > some shallow reviewing and has the burden to do the housekeeping and
> > the merging. So why accepting these language bindings in the first
> > place without bringing a core maintainer in place? For example, let's
> > say someone proposes a binding to Haskell now. That should not be
> > accepted as part of the official Apache implementation without a
> > dedicated maintainer (ideally the PR-author would be that person, but
> > there may others who step up).
> >
> > Right now, it might be too late to remove some of the incomplete / WIP
> > implementations that don't have a core maintainer though.
> >
> > # GitHub
> > Another special thing to consider is that Arrow is (ab)using GitHub as
> > a code hosting platform. Even as a contributor, this has obvious bad
> > uncool consequences:
> >
> > - you have yet another issue hosting system to log in
> > - there is yet another information channel to keep track of (this ML
> >  for example, which has a semi-informative web interface telling you
> >  can only login using Google but does not tell you how to subscribe to
> >  the list)
> > - links to issues don't work in the known magic way
> > - you're merging the PRs by closing them; which is by all means a not
> >  very nice way because it does not reflect the contributors work in
> >  the project overview and personal profiles, but exactly this is a
> >  large part of the GitHub community (btw: merging PRs without using
> >  GitHubs merge button IS possible as bors/bors-ng proof)
> >
> > So as a potential maintainer, this is already a bumper, since I know
> > that there are things less confortable then the system I would get from
> > any normal GitHub or Gitlab project.
> >
> > I'm not really sure how to solve this or if it should be solved (read
> > about the laziness aspect in "Contribution VS Maintenance" below)
> >
> > # Time / Payment
> > Yes, this is indeed a big issue. From what I can tell from the open
> > source projects I was involved in is that for large contributor crowds,
> > you normally have full/half-time positions in place for the core
> > maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
> > / Red Hat). So at one point I think maintaining isn't a part time /
> > hobby thing anymore (w/o downgrading the hard work of Hobby-
> > contributors, in contrast). I don't have a link at hand, but I recall
> > some discussion about GitHub and it's importance for hiring (since it
> > it acts as a CV) after MS bought it, and some of the responses are
> > "doing all this work in your free time is a privilege of wealthy,
> > mostly-white men", which without signing this statement in this really
> > bare form already shows a problem of open source world.
> >
> > # Contribution VS Maintenance
> > The very "nice" thing about patch/PR contribution is that you do your
> > work and then you can walk away and it's the maintainers problem to
> > release the artifact, upgrade/migrate your code and ensure that the
> > tests you've written never break. It's comfortable. Being a maintainer
> > means all the opposite things. And in the end, you get blamed for not
> > supporting certain features (see the open source paragraph here https:/
> > /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
> > disaster).
> >
> > I think together with the previous point this means, we have to get
> > companies to pay for that work, and not just dump their features to an
> > OSS repo.
> >
> > # Path to Maintainership
> > So I think (from my narrow point of view!) that many people expect that
> > the path from "outsider" to "maintainer" takes the route over "a lot of
> > patch/PR contributions". If I'm reading your mail right, that is not
> > necessarily the case for Apache projects and I think that's great. The
> > "review PRs" path sounds great, but I think GitHub or any platform I'm
> > aware don't do a good job in getting people to do so. I mean, I see a
> > PR and a can leave a review, but for me it is not really clear which
> > consequences this have (naturally, random people don't have a veto on
> > changes). So I can jump in when I think something is wrong, but I
> > cannot approve a PR. This makes sense, but it poses the question of
> > "how?!". I mean, it is pretty clear on how to become a patch/PR
> > contributor, but it is not clear on how to become a maintainer, at
> > least not in an easy way. (I'm sure it's written down somewhere).
> >
> > So, overall I think a clear Call for Action at the top of the README
> > could help. Like "Hey, we're looking for maintainers, you could start
> > by reviewing some PRs and after some reviews maintainers will just be
> > the last gatekeeper and after some more time, you can even merge PRs on
> > your own".
> >
> > # My personal contribution
> > Triggered by this call for help, I'll try to get more involved in
> > Python, C++ and Rust reviews.
> >
> > So, these are some thoughts that I hope may help.
> >
> > Thanks again for addressing this issue and your time and passion,
> > Marco
> >
> >> On 2018/06/30 14:57:42, Wes McKinney <w...@xxxxxxxxx> wrote:
> >> hi folks,>
> >>
> >> Arrow has grown by leaps and bounds over the last 2.5 years. We are>
> >> approaching our 2000th patch and on track to surpass 200 unique>
> >> contributors by year end.>
> >>
> >> All this contribution growth is great, but it has a hidden cost:
> >
> > the>
> >> maintenance. The burden of maintaining the project: particularly>
> >> reviewing and merging patches, has fallen on a very small number of>
> >> people. From the commit logs, we can see how many patches each>
> >> committer has merged:>
> >>
> >> $ git shortlog -csn
> >
> > d5aa7c46692474376a3c31704cfc4783c86338f2..master>
> >>  1289  Wes McKinney>
> >>   268  Uwe L. Korn>
> >>    74  Korn, Uwe>
> >>    54  Antoine Pitrou>
> >>    52  Julien Le Dem>
> >>    39  Philipp Moritz>
> >>    18  Kouhei Sutou>
> >>    18  Steven Phillips>
> >>    13  Bryan Cutler>
> >>    11  Jacques Nadeau>
> >>    10  Phillip Cloud>
> >>     8  Brian Hulette>
> >>     5  Robert Nishihara>
> >>     5  adeneche>
> >>     4  GitHub>
> >>     3  Sidd>
> >>     3  siddharth>
> >>     1  AbdelHakim Deneche>
> >>     1  Your Name Here>
> >>
> >> So Uwe and I have merged ~84% of the patches in the project so far.>
> >> This isn't a completely accurate reflection of the maintainer
> >
> > burden,>
> >> since many others contribute to code reviews and other aspects of>
> >> patch maintenance, and you have to be a committer to earn a place
> >
> > on>
> >> this list.>
> >>
> >> I'm not sure what's the best way to address this problem. The
> >
> > quality>
> >> of our code review has declined at times as we struggle to keep up>
> >> with the flow of patches -- I don't think this is good. Having the>
> >> patch queue pile up isn't great either. Personally, I'm having a>
> >> difficult time balancing project maintenance and patch authoring,>
> >> particularly in the last 6 months.>
> >>
> >> Unfortunately, many people believe that writing patches is the
> >
> > primary>
> >> mode of contribution to an open source project. Apache projects>
> >> explicitly state that non-patch contributions are valued in earning>
> >> karma (committership and PMC membership). We're starting to have
> >
> > more>
> >> corporate contributors come out of the woodwork, and while it's
> >
> > great>
> >> for contributors to be paid to write patches for the project, they
> >
> > are>
> >> rarely given the time and space to contribute meaningfully to>
> >> maintenance.>
> >>
> >> Any thoughts about how we can grow the maintainership? Somehow we
> >
> > need>
> >> to reach ~5-6 core maintainers over the next year.>
> >>
> >> Thanks,>
> >> Wes>
>