[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why are large code drops damaging to a community?

On Fri, Oct 19, 2018 at 10:32 PM James Dailey <jamespdailey@xxxxxxxxx> wrote:
> +1 on this civil discourse.
> I would like to offer that sometimes large code drops are unavoidable and
> necessary.  Jim's explanation of httpd contribution of type 1 is a good
> example.

I suspect that Jim's example of Robert Thau's httpd contribution may
not have been necessary were he working with the source control tools
and methods available today.  That was in 1995.  Even when we assume
it was necessary in 1995, it was risky and probably only succeeded
because of an abundance of communication and real good will with the
community.  I'm guessing though.  I haven't given up hope that Jim
will provide more details.

> I think we would find that many projects started with a large code drop
> (maybe more than one) - a sufficient amount of code - to get a project
> started.  When projects are young it would be normal and expected for this
> to happen. It quickly gets a community to a "thing" that can be added to.

Many do start with a large code drop.  They go through incubation, and
sometimes fail to attract communities.  If they fail to attract
communities, they eventually reach the attic.  Many projects also
succeed in starting at the ASF with an empty repository and building
from zero.  These are often very successful at attracting communities.
People like to work on things in which they are empowered to
participate in decision making.  People like the feeling of trust and
community which results.  PLC4X which Chris Dutz is working on is an
excellent example.

> It obviously depends on the kinds of components, tools, frameworks, etc
> that are being developed. Game theory is quite apropos - you need a
> sufficient incentive for *timely* collaboration, of hanging together.
> Further, if your "thing" is going to be used directly in market (i.e. with
> very little of a product wrapper ), then there is a strong *disincentive*
> to share back the latest and greatest. The further from market immediacy
> the easier it is to contribute. Both the Collaboration space and
> Competitive space are clearly delineated, whereas in a close to market
> immediacy situation you have too much overlap and therefore a built in
> delay of code contribution to preserve market competitiveness.

This is one important reason that there are very few full applications
at the ASF.  Fortunately for Fineract, we don't have to function as a
full application.  This might be an argument for our community to stay
away from customer-facing front-ends.

If working together with others on your code would cause you important
market disadvantages, then you probably don't want to take part in
open source as it is conceived at the Apache Software Foundation.  If
a vendor's goal is simply to publish their source, then a plain old
github account is probably the least expensive method available to
that vendor.  If a vendor's goal is to dominate a project, then there
are 501(c)6's out there which have the latitude to make that sort of
collaboration possible.  Those are valid and reasonable approaches.
The ASF isn't trying to compete in those spaces.  The ASF wants its
projects to be built as a community of equals.

> So, combining the "sufficient code to attract contribution" metric with the
> market-immediacy metric and you can predict engagement by outside vendors
> (or their contributors) in a project.

A "Sufficient code" metric isn't one I've ever personally used to
decide which project to contribute to.  I don't believe vendors use
this either.  I think some developers are even turned off by the
existence of massive amounts of code.  It triggers the "not invented
here" complex many developers have. : o)  But I'm happy to look at
data that shows I'm wrong.

Perhaps  you mean a "sufficient business functionality" metric?  But
even then, what I've seen more often is a "sufficiently healthy
community" metric.  Ie: how long has the project existed, how long is
it likely to continue to exist?  These are the questions I get asked
by people outside of our project.

> In such a situation, it is better, in
> my view, to accept any and all branched code even if it is dev'd off-list.

The Apache Software Foundation does not "accept any and all" off-list
development.  There are reasonable arguments to be had about the grey
areas and how far they extend, but there is *never* carte blanche.

> This allows for inspection/ code examination and further exploration - at a
> minimum.  Accepting on a branch is neither the same as accepting for
> release, nor merging to master branch.

Inspection/examination can also be accomplished on github.  We don't
need to accept code into an Apache project in order to be able to see
it.  If a vendor wishes to propose merging a branch, we can look at
that code and determine whether it is inside or outside our project's
grey area *before* we accept the code.  If code is large, that
proposal can be made available in reviewable-sized chunks by the
vendor, as Upayavira described doing.  This not only has the advantage
of improving code quality, it also means community members have a
chance to participate meaningfully in the decision-making process and
learn how to work within the new code.  Learning by doing is always
the deepest learning.

> Now, the assumption that the code is better than what the community has
> developed has to be challenged.  It could be that the branched code should
> be judged only on the merits of the code (is it better and more complete),
> or it could be judged on the basis that it "breaks the current build".
> There can be a culture of a project to accept such code drops with the
> caveat that if the merges cannot be done by the submitting group, then the
> project will have a resistance to such submissions (you break it, you fix
> it), or alternatively that there will be a small group of people that are
> sourced from such delayed-contribution types - that work on doing the
> merges.  The key seems to be to create the incentive to share code before
> others do, to avoid being the one that breaks the build.

Shall we call this the "preemptive architecture" approach?  Certainly
programmers sometimes engage in this kind of adversarial behavior.
I've seen it.  I've never been happy with the consequences.  I would
not wish to work for a company which was racing to get there first
with the largest possible chunk of code, with the consequence of
failure being resolving months of merge conflicts.  Not to mention
that there's more to software engineering than just lines of code to
merge.  Entire problem-solving approaches can be in conflict.  And
resolutions of these will involve more than just staring at a
three-way merge of files all day.

The consequences for employees aside, it doesn't seem likely that
anybody in the scenario you describe actually has time to test the new
code or its merge, nor to participate in the community aspects of
teaching others about the code being contributed and collaborating
with people outside their company.  I'd be very concerned about the
quality of code produced using this approach.

I hope we can avoid incentivizing vendors to treat their employees and
our projects this way.

Best Regards,