[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting the repo

On Wed, Oct 10, 2018 at 10:25 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
On the split point: a mono-repo works for me as well. The main point is "N separate builds".

On the portable thing: currently runner integrates with portable api. It impacts all runner. The needed code is the same everywhere since it is mainly a DoFn at the end (a bit caricatural but that is the big picture) so at the end the portable impl can be unique and built in top of any runner. The gains are:

1. Dont pollute java users
2. Single code maintenance
3. Support to upgrade the runner without changing this layer (contract based integration - vs coupled one - so smoother updates in all layers)
4. Simpler code (at least in design)

Hooe it is clearer

Right now the basic structure is

  Beam Runners Core Library
  Beam RunnerX Adapter Code
  Java RunnerX

Where the APIs in brackets are what are used for the various components to talk to each other, and the later two are in Java. It sounds like what you're advocating for is the (Java) Beam Runners Core Library (along with its API). Am I understanding correctly? Of course some things are easier to abstract away than others (e.g. how SDK processes, if not in process, are launched (including staging their dependencies) and monitored is squarely in the domain of the particular runner, though we can abstract as much common, helper code as possible to higher levels).

Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré <jb@xxxxxxxxxxxx> a écrit :

+1, even I think we could split the core even deeper.

I discussed with Luke and Reuven to introduce core-sql, core-schema,
core-sdf, ...

It's not a huge effort, and would allow us to move forward on Beam "more
API oriented" approach.


On 10/10/2018 10:12, Robert Bradshaw wrote:
> Hi everyone,
> While IMHO it's too early to even be able to split the repo, it's not to
> early to talk about it, and I wanted to spin this off to keep the other
> thread focused.
> In particular, I am trying to figure out exactly what is hoped to be
> gained by splitting things up. In my experience, a single project that
> spans multiple repos has always come with excessive overhead and pain.
> Of note, we recently merged the website and dataflow-worker into the
> main repo *exactly* to avoid this pain (though the latter was
> particularly bad due to one of the repos being private).
> If need be, I don't see any reason we can't have a single repo with
> directories
> model/
> website/
> java/
> go/
> ...
> possibly even with their own build system (unified only through a
> top-level "build everything" script that descends into each subdir and
> runs the appropriate command). I'm not saying we should do this (there
> is value in having a single consistent build system, etc.) but it's
> possible. We could probably even make separate releases out of this
> single repo (if we wanted, though given that our releases are time-based
> rather than feature-based, I don't see much advantage here).
> Also, there was the comment.
> On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> <rmannibucau@xxxxxxxxx <mailto:rmannibucau@xxxxxxxxx>> wrote:
>> Side note: beam portability would be saner if added on top of others
> than the opposite which is done today.
> I think you brought this up before, Romain. I'm still trying to wrap my
> head around what you mean here. Could you elaborate what such a
> structure would look like? 

Jean-Baptiste Onofré
Talend - http://www.talend.com