[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Gradle for the build ?

Le mer. 10 oct. 2018 à 06:35, Kenneth Knowles <kenn@xxxxxxxxxx> a écrit :
Here are some things I hear a lot:
  • Beam technical decision making is too Java-centric
  • Gradle is a lot less Java-centric than Maven
Hmm isnt it the opposite being groovy based? Maven tends to have the same set of plugins than gradle so it is a kind of status-quo if you check technically and are factual.
  • Gradle is still a Java-centric tool, but at least it isn't so slow/wasteful

Not what I experienced and it seems JB had the same experience. Gradle is comparable to maven in terms of full build and is way slower in Idea cause of the lack of integration and support (by design).
I can understand long-time Java devs wanting to have their familiar and dominant toolchain, with good tooling and integrations (ignoring any objective merits of Gradle for Java dev). But for non-Java devs you are meeting in the middle. Actually, you are not even in the middle - you are still in Java land. For example, setup.py can do arbitrary things, so we could use it to do the Java build! What do you think of trying this out? (j/k).

What I would like is to better support language-native development workflows, and make sure Gradle is lightweight glue that is easy to use. Make the configurations as obvious to read as we can, with as little hackery as possible.

Agree and think it can be time, now Beam python/go support become something, to split in N repos. Release lifecycles are different, codebase has no real link except the portability layer which can get its own repo so probably worth a PoC:

1. beam-portability
2. beam-java
3. beam-go
4. beam-python

Side note: beam portability would be saner if added on top of others than the opposite which is done today.


We can make Gradle a lot better (for the above, and for Java too). From my work on the core of our Gradle support code, I can name a few issues that I'm sure remain even after many months away from the code:

1. We were learning as we built it out. We don't always do things in the natural way.
2. We made our own abstractions to streamline converting lots of modules quickly. Clarity and efficiency of the build were not the primary concerns. (to be clear: it was an amazing undertaking to get this done and the decision was right for the moment)
2a. We tried to centralize a lot of policy, leading to the centralized bits containing the union of all complexity of all modules (or maybe it is multiplicative).
3. We tried very hard to match the mvn build exactly, rather than doing the best thing in Gradle.
4. We've built a lot of imperative code for "telling Gradle what to do" and that adds a lot of complexity compared with using Groovy as primarily a configuration language.
5. We have turned on things like "always rebuild" when we don't know the dependencies, rather than putting in the work to get the dependencies right.

A lot of the above also makes it hard for IDEs to grok the config, since we deviate from the "golden path" a lot.

It would be awesome if module owners took on the task of making their modules have an awesome incremental Gradle build.

Well incremental build only matters when you run a full subpart of a job and is something very fragile - check how in months it didnt happen. In practise what's a dev workflow:

1. loop { dev in the IDE, run test }
2. run the full module or build to validate nothing has been broken
3. PR (+ back to 1 if comments)

This means that incremental support is only relevant for jenkins where the perf diff is not significative and where you can't use incremental build cause you want to have a fully reliable build.
So at the end the incremental build support is not that significative for end users and contibutors. The cost of not having the IDE support, however, is just a blocker.

So my 2cts would be to stop trying to be good theorically at the cost of loosing the users and try to embrace a community driven choices approach.


On Tue, Oct 9, 2018 at 3:38 AM Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:
For me the vendoring issue is ok cause it should belong to another shade loduke released with beam when needed. It is not an uncommon practise.

Now the lack of IDE integration for tests/debug (using gradle runner is a workaround and still hurts by its slowness compared to native run) is a clear showstopper for me.

Also, from a community perspective, gradle adoption is far to be mainstream (even spark is built with maven) so does not serve beam at the end.

Maven build didnt have any issue except the duration AFAIK, gradle has 2 blockers + several small drawbacks (custom build and no standard, no tooling without script execution, bad integration in enterprise chaines like security auditing etc). Overall gradle build is close to maven one - last time i tested it was within 15% so not worth it when you see the time you loose when developping anything. It is key to keep in mind jenkiks is cheaper than human time.

Le mar. 9 oct. 2018 13:22, Robert Bradshaw <robertwb@xxxxxxxxxx> a écrit :
On Tue, Oct 9, 2018 at 10:04 AM Jean-Baptiste Onofré <jb@xxxxxxxxxxxx> wrote:
Hi guys,

I know that's a hot topic, but I have to bring this discussion on the table.

Thank you for bringing this up and revisiting it now that we have some experience. 
Some months ago, we discussed about migrating our build from Maven to
Gradle. One of the key expected improvement was the time to build.
We proposed to do a PoC to evaluate the impacts and improvements, but
this PoC was actually directly a migrate on master.

Now, I would like to bring facts here:

1. Build time
On my machine, the build time is roughly 1h15. It's pretty long, and
regarding what the build is doing, I don't see huge improvement provided
by Gradle.

I rarely, if ever, build from scratch so perhaps I have not been impacted by this nearly as much. (In particular, build and test times seem to have gone way down for me, probably due to better incremental support, but that's just anecdotal.) 

Is this worse than it was on maven, or just not as much better as was hoped? 
2. Build reliability
Even worse, most of the time, we need to use --no-parallel and
--no-daemon to have a reliable build (it's basically recommended for
release). It has an impact on build time, and we loose part of Gradle

I think this is a matter of incorrect dependency declarations (and is not unique to gradle). I'd have loved to been able to go with a build system that simply didn't let you have incorrect dependency declarations, but that wasn't an option for other reasons. 

I wonder if there's some automatic tooling we could leverage to fix (and keep fixed) this. Regardless, this is unfinished work that remains to be done so we can realize the full benefits. 
3. Release and repositories
Even if couple of releases has been performed with Gradle, it's not
obvious to see improvements around artifacts handling. I got my
repository polluted twice (that's part of the trick Gradle is doing to
speed up the build dealing around the repository).

Could you clarify what improvements we were expecting here? I thought the goal was that we could publish the same artifacts, with no regression. 
4. IDE integration
We already had some comments on the mailing lists about the IDE
integration. Clearly, the situation is not good on that front too. The
integration on IDE (especially IntelliJ) is not good enough right now.

This is important. To be honest, I had also issues back in the day getting the maven setup working well out of the box in IntelliJ and Eclipse (mostly with respect to things like shadowing and protobufs), so we shouldn't fall prey to the golden age fallacy. 

It seems the recent move to vendoring has caused more issues here; I'm not sure that would be fixed just moving back to maven (or how to resolve it going forward). 

On the other hand, just last week I set up a new computer according to https://cwiki.apache.org/confluence/display/BEAM/IntelliJ+Tips and that seems to be working fine. 
We are working hard to grow up the community, and from a contributor
perspective, our build system is not good today IMHO.
As a contributor, I resumed my work on some PRs, and I'm spending so
much time of the build, largely more than working on the PRs code itself.

So, obviously, the situation is not perfect, at least from a contributor

The purpose of this thread is not again to have a bunch of replied
ending nowhere. I would like to be more "pushy" and let's try to be
concrete. So basically, we only have two options:

1. Improve the build, working hard on Gradle front. Not sure if it makes
such sense from a contributor perspective, as Maven is really well known
from most of contributors (and easier to start with IMHO).
2. Back on Maven. That's clearly my preferred approach. IDE integration
is better, Maven is well known from the contributors as already said.
The effort is not so huge. We tried to use Gradle, we don't have the
expected results now, that's not a problem, it's part of a project lifetime.

Thoughts ?

I'd like to add some perspective as a primarily non-Java contributor (having contributed "only" a couple thousand lines to the java side over the last couple of years) that maven feels much more java-centric (as is our repo, but that's a separate issue). Also, as someone not coming from with years of maven (or gradle for that matter) experience before working on Beam, I have found gradle much more intuitive and easier to learn. Especially when it comes to developing changes that span multiple modules (which for some reason I guess I tend to do a lot of, but with so many of our core sdk tests being validates runner that's likely to hit people just starting out as well). Now of course I don't want to discount the Java community, indeed it's arguably the largest and most important one at this point, but I also think that Beam's ability to not be limited to that one language (and ecosystem) is one of its huge selling points and differentiation (see the whole portability effort, which is for both SDKs and Runners). 

Even from a Java perspective, neither is so obscure as to be a significant barrier, and both are customizable enough that the average new developer is probably going to be looking at the documentation to see what commands to run to build/test/validate this project. So this is probably something documentation can address. 

But if we're going to reap the benefits and minimize the downsides of this migration, we have to finish the job. 

- Robert