[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recruiting more maintainers for Apache Arrow

For what it's worth, this email thread and your summary writeup, Wes, are a significant call to action on their own. 

I've been passive, not by choice, but by policy. Given the significance and need of this project, I'll see what I can do on my side. It will be at least a week given the US holiday. 

Donald E. Foss

> On Jun 30, 2018, at 2:15 PM, Marco Neumann <marco@xxxxxxxxxxxxxx.INVALID> wrote:
> Hey,
> first of all, thanks a lot for your, Uwes, the mergers and contributors
> work. Now, to the maintainer problem:
> # Arrow as "a library"
> One thing that makes Arrow special is that it is not a single, but many
> libraries (one for each language) and many of them are not only a
> binding to a C/C++ lib, but partly a complete re-implementation of the
> protocol, e.g.:
> - C++: one core, but also contains Python specialties
> - Java: another core
> - Rust: yet another core
> - Python: a binding to C++ but also a lot more stuff because of Pandas
> ...
> And you two are maintaining all of them and I doubt that you have the
> capacities and knowledge to do this at the desired level of quality
> (which is natural, not a personal issue or offense). So this I would
> call "pseudo-maintenance", since you're solely the gatekeeper that does
> some shallow reviewing and has the burden to do the housekeeping and
> the merging. So why accepting these language bindings in the first
> place without bringing a core maintainer in place? For example, let's
> say someone proposes a binding to Haskell now. That should not be
> accepted as part of the official Apache implementation without a
> dedicated maintainer (ideally the PR-author would be that person, but
> there may others who step up).
> Right now, it might be too late to remove some of the incomplete / WIP
> implementations that don't have a core maintainer though.
> # GitHub
> Another special thing to consider is that Arrow is (ab)using GitHub as
> a code hosting platform. Even as a contributor, this has obvious bad
> uncool consequences:
> - you have yet another issue hosting system to log in
> - there is yet another information channel to keep track of (this ML
>  for example, which has a semi-informative web interface telling you
>  can only login using Google but does not tell you how to subscribe to
>  the list)
> - links to issues don't work in the known magic way
> - you're merging the PRs by closing them; which is by all means a not
>  very nice way because it does not reflect the contributors work in
>  the project overview and personal profiles, but exactly this is a
>  large part of the GitHub community (btw: merging PRs without using
>  GitHubs merge button IS possible as bors/bors-ng proof)
> So as a potential maintainer, this is already a bumper, since I know
> that there are things less confortable then the system I would get from
> any normal GitHub or Gitlab project.
> I'm not really sure how to solve this or if it should be solved (read
> about the laziness aspect in "Contribution VS Maintenance" below)
> # Time / Payment
> Yes, this is indeed a big issue. From what I can tell from the open
> source projects I was involved in is that for large contributor crowds,
> you normally have full/half-time positions in place for the core
> maintainer (look at the Mozilla projects, the Blender Foundation, Gnome
> / Red Hat). So at one point I think maintaining isn't a part time /
> hobby thing anymore (w/o downgrading the hard work of Hobby-
> contributors, in contrast). I don't have a link at hand, but I recall
> some discussion about GitHub and it's importance for hiring (since it
> it acts as a CV) after MS bought it, and some of the responses are
> "doing all this work in your free time is a privilege of wealthy,
> mostly-white men", which without signing this statement in this really
> bare form already shows a problem of open source world.
> # Contribution VS Maintenance
> The very "nice" thing about patch/PR contribution is that you do your
> work and then you can walk away and it's the maintainers problem to
> release the artifact, upgrade/migrate your code and ensure that the
> tests you've written never break. It's comfortable. Being a maintainer
> means all the opposite things. And in the end, you get blamed for not
> supporting certain features (see the open source paragraph here https:/
> /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL
> disaster).
> I think together with the previous point this means, we have to get
> companies to pay for that work, and not just dump their features to an
> OSS repo.
> # Path to Maintainership
> So I think (from my narrow point of view!) that many people expect that
> the path from "outsider" to "maintainer" takes the route over "a lot of
> patch/PR contributions". If I'm reading your mail right, that is not
> necessarily the case for Apache projects and I think that's great. The
> "review PRs" path sounds great, but I think GitHub or any platform I'm
> aware don't do a good job in getting people to do so. I mean, I see a
> PR and a can leave a review, but for me it is not really clear which
> consequences this have (naturally, random people don't have a veto on
> changes). So I can jump in when I think something is wrong, but I
> cannot approve a PR. This makes sense, but it poses the question of
> "how?!". I mean, it is pretty clear on how to become a patch/PR
> contributor, but it is not clear on how to become a maintainer, at
> least not in an easy way. (I'm sure it's written down somewhere).
> So, overall I think a clear Call for Action at the top of the README
> could help. Like "Hey, we're looking for maintainers, you could start
> by reviewing some PRs and after some reviews maintainers will just be
> the last gatekeeper and after some more time, you can even merge PRs on
> your own".
> # My personal contribution
> Triggered by this call for help, I'll try to get more involved in
> Python, C++ and Rust reviews.
> So, these are some thoughts that I hope may help.
> Thanks again for addressing this issue and your time and passion,
> Marco
>> On 2018/06/30 14:57:42, Wes McKinney <w...@xxxxxxxxx> wrote: 
>> hi folks,> 
>> Arrow has grown by leaps and bounds over the last 2.5 years. We are> 
>> approaching our 2000th patch and on track to surpass 200 unique> 
>> contributors by year end.> 
>> All this contribution growth is great, but it has a hidden cost:
> the> 
>> maintenance. The burden of maintaining the project: particularly> 
>> reviewing and merging patches, has fallen on a very small number of> 
>> people. From the commit logs, we can see how many patches each> 
>> committer has merged:> 
>> $ git shortlog -csn
> d5aa7c46692474376a3c31704cfc4783c86338f2..master> 
>>  1289  Wes McKinney> 
>>   268  Uwe L. Korn> 
>>    74  Korn, Uwe> 
>>    54  Antoine Pitrou> 
>>    52  Julien Le Dem> 
>>    39  Philipp Moritz> 
>>    18  Kouhei Sutou> 
>>    18  Steven Phillips> 
>>    13  Bryan Cutler> 
>>    11  Jacques Nadeau> 
>>    10  Phillip Cloud> 
>>     8  Brian Hulette> 
>>     5  Robert Nishihara> 
>>     5  adeneche> 
>>     4  GitHub> 
>>     3  Sidd> 
>>     3  siddharth> 
>>     1  AbdelHakim Deneche> 
>>     1  Your Name Here> 
>> So Uwe and I have merged ~84% of the patches in the project so far.> 
>> This isn't a completely accurate reflection of the maintainer
> burden,> 
>> since many others contribute to code reviews and other aspects of> 
>> patch maintenance, and you have to be a committer to earn a place
> on> 
>> this list.> 
>> I'm not sure what's the best way to address this problem. The
> quality> 
>> of our code review has declined at times as we struggle to keep up> 
>> with the flow of patches -- I don't think this is good. Having the> 
>> patch queue pile up isn't great either. Personally, I'm having a> 
>> difficult time balancing project maintenance and patch authoring,> 
>> particularly in the last 6 months.> 
>> Unfortunately, many people believe that writing patches is the
> primary> 
>> mode of contribution to an open source project. Apache projects> 
>> explicitly state that non-patch contributions are valued in earning> 
>> karma (committership and PMC membership). We're starting to have
> more> 
>> corporate contributors come out of the woodwork, and while it's
> great> 
>> for contributors to be paid to write patches for the project, they
> are> 
>> rarely given the time and space to contribute meaningfully to> 
>> maintenance.> 
>> Any thoughts about how we can grow the maintainership? Somehow we
> need> 
>> to reach ~5-6 core maintainers over the next year.> 
>> Thanks,> 
>> Wes>