osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting the repo


On Wed, Oct 10, 2018 at 9:21 PM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
>
> I think Robert's initial question needs to be focused on a particular split.

Yes, thank for bringing this back to the original question.

> I agree that a "single project spanning multiple repos" does not make sense. But separate projects in separate repos is pretty widely used :-). The point of separate repos IMO would be to empower (and force) them to act as separate projects.
>
> Every monorepo I have worked in has struggled with modularity problems. But conversely, a project with poor modularity can thrive in a monorepo because it is feasible to make changes across all the bits that are tightly coupled. Because it is a subtext whenever a Google employee talks about monorepos, I want to call out that Google's uniquely massive and interesting monorepo requires a tremendous amount of bespoke infrastructure to manage coupling, testing, ownership, etc*. It is not analogous to a large repo on GitHub.
>
> So... which pieces are "not separate enough" and why and how do we want to make them separate?
>
> I can think of some candidates that could benefit from some kind of "separateness":
>
>  - IOs or collections of IOs: separate release cadence, only build on stable SDK releases (potential for diamond dep problems)
>  - Portability protos: forces them to be highly stable and forces runners to adapt to major iterations
>  - Language SDKs: easier to build a community of devs with a clearly familiar project structure and toolchain
>
> Maybe the kinds of separation that folks want does not have to be a separate repo, as mentioned. But it is still important that most infrastructure and UI is geared towards a certain scale of project (not just repo): issue tracking, pull request management, mailing lists, ownership, selective test execution, triaging test failures, etc.

+1. I don't think the subcomponents of beam are yet independent or
large enough to merit being separate projects. (One criteria for being
a separate project is having its own website, otherwise where should
the website sources live?) Another criteria is the point at which
there is more gain then pain by allowing users to mix and match
different versions of different projects (and the forcing function of
being highly stable becomes more of an asset rather than a hindrance).

We may of course get there in time, but I don't think we're there yet
(certainly not until potability settles down at least), and consensus
seems to be that better divisions in the existing repo would resolve
most peoples concerns at the moment.

> At this point, I see strong arguments in both directions and think that a specific proposal of a specific split at the right time deserves an individualized discussion.
>
> Kenn
>
> *Other issues include governance and effectiveness for shipping user-friendly libraries
>
>
>
>
> On Wed, Oct 10, 2018 at 11:12 AM Ankur Goenka <goenka@xxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> I think the subtext here is that development is hard in general. I agree to it. And a major cause of it is diversity of languages, complexity of the project and legacy code.
>> To alleviate language related issues, we are trying to have modular code which we already have to a certain extent.
>> On the other hand tooling is still evolving and needs improvement. I also feel that tooling is a moving target and its good to keep on reevaluating it.
>> Tooling is a problem for everyone (the whole community) and we are actively trying to solve it. Gradle is a big step towards it.
>> I personally contribute to multiple languages. Many of the PR have changes spanning across languages and have to be merged as a whole. I personally feel that having a unified build system makes it easier to do the checks and make sure things work.
>> Even after gradle, I am still able to setup intellij for Java, Pycharm for Python and GoLand for Go as I would have done earlier (before gradle). I am also able to run "python setup.py sdist" as I was able to do before gradle.
>> Gradle is also acting as the top level task manager and most of the python tasks are just plain shell commands stitched together.
>> The only real problem that I face in my setup is the vendored java jars which only impact java development.
>> Probably documenting separate environment specific setup for each language is sufficient to address the issue.
>>
>> I also agree with Max that splitting the repo will cause more pain than gain.
>>
>> ~Ankur
>>
>>
>>
>> On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
>>>
>>>
>>>
>>>
>>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mxm@xxxxxxxxxx> a écrit :
>>>>
>>>> Hi,
>>>>
>>>> I agree that splitting up Beam into separate repositories would cause
>>>> more pain than gain.
>>>>
>>>> To a large degree we already have independent modules, e.g. runners/* or
>>>> sdks/*. Although this is not the case for the core. It would be
>>>> desirable to break it up further.
>>>
>>>
>>> Think this part is ok for everyone.
>>>
>>>>
>>>>
>>>>  > possibly even with their own build system (unified only through a
>>>>  > top-level "build everything" script that descends into each subdir and
>>>>  > runs the appropriate command).
>>>>
>>>> This is almost what we have. Yes, there are some dependencies on the
>>>> Beam Gradle Plugin, but even if we had completely independent build
>>>> directories, you'd still want to have a shared config/tasks across the
>>>> projects (which might bring you back to a setup similar to what we have).
>>>>
>>>> One of the pain points seems to be the portability which "polluted" some
>>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>>> that could have been solved with an abstraction. But the lack of
>>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>>
>>> Not at all. Assume we have a full build which is doing portability then 3 concurrent builds (go, python, java)
>>> then we have "current step" in the CI but the dev are never affected by that and the build does not mess up their machines as well.
>>>
>>> Today the main blocker is that default "profile" (script) is not matching dev persona and therefore there is no real hope to have external contributions
>>> outside google related guys as mentionned by previous ficgures which is sad for a project promishing unification and work between communities IMHO.
>>>
>>>>
>>>>
>>>> -Max
>>>>
>>>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>>>> > Yep for the split
>>>> >
>>>> > For the clean point it is quite linked to the build tools and fake env
>>>> > for not native modules for the build tool (go for gradle which is java
>>>> > first for instance). This is why having a real build which is natural
>>>> > per language would be beneficial IMO.
>>>> >
>>>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@xxxxxxxxxxxx
>>>> > <mailto:jb@xxxxxxxxxxxx>> a écrit :
>>>> >
>>>> >     Correct, it's more "module splitting" than repositories indeed.
>>>> >
>>>> >     Regards
>>>> >     JB
>>>> >
>>>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>>>> >      > Gotcha. So this is more about dividing the code (particularly
>>>> >     core) into
>>>> >      > finer modules, rather than splitting the modules into separate
>>>> >      > repositories, right?
>>>> >      >
>>>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>>>> >     <jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>
>>>> >      > <mailto:jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>>> wrote:
>>>> >      >
>>>> >      >     The purpose is that we have a monolithic core today mostly
>>>> >     providing
>>>> >      >     abstract classes.
>>>> >      >
>>>> >      >     The idea is to have something more API oriented with
>>>> >     interface/SPI.
>>>> >      >
>>>> >      >     Our users would then be able to pick the part of the core
>>>> >     they want,
>>>> >      >     resulting with lighter artifacts, and for us, it gives a more
>>>> >     flexible
>>>> >      >     approach.
>>>> >      >
>>>> >      >     Regards
>>>> >      >     JB
>>>> >      >
>>>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>>>> >      >     > My question was not whether we should split the repo, but why?
>>>> >      >     (Dividing
>>>> >      >     > things into more (or fewer) modules withing a single repo is a
>>>> >      >     separate
>>>> >      >     > question.) Maybe I'm just not following what you mean by
>>>> >     "more API
>>>> >      >     > oriented." It would force stabler APIs.
>>>> >      >     >
>>>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>>>> >      >     <jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>
>>>> >     <mailto:jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>>
>>>> >      >     > <mailto:jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>
>>>> >     <mailto:jb@xxxxxxxxxxxx <mailto:jb@xxxxxxxxxxxx>>>> wrote:
>>>> >      >     >
>>>> >      >     >     Hi,
>>>> >      >     >
>>>> >      >     >     +1, even I think we could split the core even deeper.
>>>> >      >     >
>>>> >      >     >     I discussed with Luke and Reuven to introduce core-sql,
>>>> >      >     core-schema,
>>>> >      >     >     core-sdf, ...
>>>> >      >     >
>>>> >      >     >     It's not a huge effort, and would allow us to move
>>>> >     forward on
>>>> >      >     Beam "more
>>>> >      >     >     API oriented" approach.
>>>> >      >     >
>>>> >      >     >     Regards
>>>> >      >     >     JB
>>>> >      >     >
>>>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>>>> >      >     >     > Hi everyone,
>>>> >      >     >     >
>>>> >      >     >     > While IMHO it's too early to even be able to split
>>>> >     the repo,
>>>> >      >     it's
>>>> >      >     >     not to
>>>> >      >     >     > early to talk about it, and I wanted to spin this off to
>>>> >      >     keep the
>>>> >      >     >     other
>>>> >      >     >     > thread focused.
>>>> >      >     >     >
>>>> >      >     >     > In particular, I am trying to figure out exactly what is
>>>> >      >     hoped to be
>>>> >      >     >     > gained by splitting things up. In my experience, a single
>>>> >      >     project that
>>>> >      >     >     > spans multiple repos has always come with excessive
>>>> >     overhead
>>>> >      >     and pain.
>>>> >      >     >     > Of note, we recently merged the website and
>>>> >     dataflow-worker
>>>> >      >     into the
>>>> >      >     >     > main repo *exactly* to avoid this pain (though the
>>>> >     latter was
>>>> >      >     >     > particularly bad due to one of the repos being private).
>>>> >      >     >     >
>>>> >      >     >     > If need be, I don't see any reason we can't have a single
>>>> >      >     repo with
>>>> >      >     >     > directories
>>>> >      >     >     >
>>>> >      >     >     > model/
>>>> >      >     >     > website/
>>>> >      >     >     > java/
>>>> >      >     >     > go/
>>>> >      >     >     > ...
>>>> >      >     >     >
>>>> >      >     >     > possibly even with their own build system (unified only
>>>> >      >     through a
>>>> >      >     >     > top-level "build everything" script that descends
>>>> >     into each
>>>> >      >     subdir and
>>>> >      >     >     > runs the appropriate command). I'm not saying we
>>>> >     should do
>>>> >      >     this (there
>>>> >      >     >     > is value in having a single consistent build system,
>>>> >     etc.)
>>>> >      >     but it's
>>>> >      >     >     > possible. We could probably even make separate
>>>> >     releases out
>>>> >      >     of this
>>>> >      >     >     > single repo (if we wanted, though given that our
>>>> >     releases are
>>>> >      >     >     time-based
>>>> >      >     >     > rather than feature-based, I don't see much advantage
>>>> >     here).
>>>> >      >     >     >
>>>> >      >     >     > Also, there was the comment.
>>>> >      >     >     >
>>>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>>> >      >     >     > <rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>
>>>> >     <mailto:rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>>
>>>> >      >     <mailto:rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>
>>>> >     <mailto:rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>>>
>>>> >      >     >     <mailto:rmannibucau@xxxxxxxxx
>>>> >     <mailto:rmannibucau@xxxxxxxxx> <mailto:rmannibucau@xxxxxxxxx
>>>> >     <mailto:rmannibucau@xxxxxxxxx>>
>>>> >      >     <mailto:rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>
>>>> >     <mailto:rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>>>>> wrote:
>>>> >      >     >     >>
>>>> >      >     >     >> Side note: beam portability would be saner if added
>>>> >     on top
>>>> >      >     of others
>>>> >      >     >     > than the opposite which is done today.
>>>> >      >     >     >
>>>> >      >     >     > I think you brought this up before, Romain. I'm still
>>>> >     trying to
>>>> >      >     >     wrap my
>>>> >      >     >     > head around what you mean here. Could you elaborate
>>>> >     what such a
>>>> >      >     >     > structure would look like?
>>>> >      >     >
>>>> >      >     >     --
>>>> >      >     >     Jean-Baptiste Onofré
>>>> >      >     > jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>
>>>> >     <mailto:jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>>
>>>> >      >     <mailto:jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>
>>>> >     <mailto:jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>>>
>>>> >      >     > http://blog.nanthrax.net
>>>> >      >     >     Talend - http://www.talend.com
>>>> >      >     >
>>>> >      >
>>>> >      >     --
>>>> >      >     Jean-Baptiste Onofré
>>>> >      > jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>
>>>> >     <mailto:jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>>
>>>> >      > http://blog.nanthrax.net
>>>> >      >     Talend - http://www.talend.com
>>>> >      >
>>>> >
>>>> >     --
>>>> >     Jean-Baptiste Onofré
>>>> >     jbonofre@xxxxxxxxxx <mailto:jbonofre@xxxxxxxxxx>
>>>> >     http://blog.nanthrax.net
>>>> >     Talend - http://www.talend.com
>>>> >