osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gandiva snapshot releases


Hi All,

I spent some time today understanding cross bow and it looks great!

To unblock ourselves immediately, we are going to do the ubuntu deploy
first, followed by the mac deploy and the fat jar deployment.

To confirm our understanding we would be doing the following

1. Create a queue repo similar to one here(
https://github.com/praveenbingo/crossbow) but under dremio org.
2. Have the repo kick off crossbow builds for each OS that we would want.
3. In addition to OS builds, there would be another build which would just
be waiting for the OS builds to finish (with some timeout) and once done
will package the fat jar and deploy to maven.

The only thing that i am unclear of is the maven deploy tokens. Since i am
not a committer with permissions to push to maven repo, I would need keys
to be configured in the dremio/crossbow environment variables.

Wes - do Siddharth/Jacques have permissions to push to maven repo and can i
use the same?

Also looks like the release scripts here
<https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh>
would need to be changed as well if we want to deploy the fat jar as part
of releases.

Kristian - can you please review the proposed steps and let me know if they
look correct to you?

Thx.


On Wed, Oct 10, 2018 at 11:33 PM Praveen Kumar <praveen@xxxxxxxxxx> wrote:

> Hi Wes,
>
> I'll take this to completion. Will send out a proposal tomorrow.
>
> Thx.
>
> On Wed, Oct 10, 2018, 23:32 Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>
>> hi folks,
>>
>> How would you like to proceed on this? I'm tracking many projects
>> right now so I want to make sure someone else is "in charge" on this
>> effort
>>
>> Thanks,
>> Wes
>> On Sat, Oct 6, 2018 at 10:37 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>> >
>> > > We could create a worker pool like abstraction where the workers are
>> the CI services, but that would require a scheduler to poll the finished
>> jobs then submit the dependent ones. This sounds a bit inconvenient, where
>> would that scheduler run: locally, on a CI or self hosted?
>> >
>> > Inevitably we're going to need to build some kind of job scheduler,
>> > whether it uses Airflow or Luigi or some other tool of our own
>> > devising.
>> >
>> > Apache Arrow is eventually going to need a host where we can manage
>> > such workflows. I'm looking into the possibility of a physical
>> > CUDA-equipped host that could be made available to Arrow developers to
>> > use for testing and benchmarking. I may need to run the machine out of
>> > my home (we did something similar for pandas -- physical machine that
>> > we can SSH into).
>> >
>> > All this idealism aside -- we take the shortest path possible for this
>> > particular packaging job, and make improvements as we can going
>> > forward.
>> > On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs
>> > <szucs.krisztian@xxxxxxxxx> wrote:
>> > >
>> > > I see now, so the jar would contain all of the three shared libraries.
>> > >
>> > > We could create a worker pool like abstraction where the workers are
>> the
>> > > CI services, but that would require a scheduler to poll the finished
>> jobs
>> > > then
>> > > submit the dependent ones. This sounds a bit inconvenient, where would
>> > > that scheduler run: locally, on a CI or self hosted?
>> > >
>> > > Another approach would be to use the worker the schedule the next
>> task,
>> > > in a similar fashion like dask's worker_client [1] launches tasks from
>> > > tasks.
>> > > There could be synchronization problems though. This approach requires
>> > > to bootstrap crossbow on each CI jobs but that would:
>> > > - make crossbow less CI dependent (to use azure pipelines as well)
>> > > - unify the artifact uploading and downloading logic which is
>> required in
>> > > order
>> > >   to support dependent tasks
>> > > - way less redundancy in task definitions
>> > >
>> > > What do You think? I'd prefer the second one.
>> > >
>> > > [1]
>> > >
>> https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst
>> > >
>> > > On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney <wesmckinn@xxxxxxxxx>
>> wrote:
>> > >
>> > > > It seems the complicated part of this will be having a dependent
>> task
>> > > > that packages up the 3 shared libraries, one for each platform,
>> after
>> > > > the individual packaging tasks are run. How would you propose
>> handling
>> > > > that?
>> > > > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs
>> > > > <szucs.krisztian@xxxxxxxxx> wrote:
>> > > > >
>> > > > > Ohh, just read the thread, sorry!
>> > > > >
>> > > > > So crossbow is located here
>> > > > https://github.com/apache/arrow/tree/master/dev/tasks
>> > > > > I suggest to "fork" the python-wheels directory which contains
>> three
>> > > > templated ymls
>> > > > > for osx, win and linux builds. For building on linux something
>> like the
>> > > > following should
>> > > > > be sufficient
>> > > > https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19
>> > > > > Then You need another entry in the tasks.yml, for example:
>> > > > > jar-gandiva-linux:
>> > > > > platform: linux
>> > > > > template: gandiva-jars/travis.linux.yml
>> > > > > params:
>> > > > > # arbitrary params which are available from the templated yml
>> > > > > ...
>> > > > > artifacts:
>> > > > > # these are the expected artifacts from the build
>> > > > > - gandiva-SNAPSHOT-{version}.jar
>> > > > > ...
>> > > > >
>> > > > > Of course crossbow is wired towards the current packaging
>> requirements,
>> > > > so likely
>> > > > > We need to adjust it to the newly appearing requirements.
>> > > > >
>> > > > > Feel free to reach me on gitter @kszucs.
>> > > > > On Oct 4 2018, at 2:02 pm, Wes McKinney <wesmckinn@xxxxxxxxx>
>> wrote:
>> > > > > >
>> > > > > > hi Praveen,
>> > > > > > Probably the best way to accomplish this is to use our new
>> Crossbow
>> > > > > > infrastructure for task automation on Travis CI and Appveyor
>> rather
>> > > > > > than trying to do all of this within the CI entries. This is
>> how we
>> > > > > > are producing all of our binary artifacts for releases now --
>> > > > > > presumably in future ASF releases, we will want to include a
>> > > > > > platform-independent Gandiva JAR in our release votes, so this
>> all
>> > > > > > needs to end up in Crossbow anyway. The intent is for the
>> Crossbow
>> > > > > > system to take on responsibility for all packaging automation
>> rather
>> > > > > > than using the normal CI for that.
>> > > > > >
>> > > > > > Krisztian, do you have time to help Praveen and the Gandiva
>> crew with
>> > > > > > this project? This will be an important test to document and
>> improve
>> > > > > > Crossbow for such use cases
>> > > > > >
>> > > > > > Thanks
>> > > > > > Wes
>> > > > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar <
>> praveen@xxxxxxxxxx>
>> > > > wrote:
>> > > > > > >
>> > > > > > > Hi Folks,
>> > > > > > > As part of https://issues.apache.org/jira/browse/ARROW-3385,
>> we are
>> > > > > > > planning to perform a snapshot release of the Gandiva Jar on
>> each
>> > > > commit to
>> > > > > > > master. This would be a platform independent jar that
>> contains the
>> > > > core
>> > > > > > > gandiva library and its jni bridge packaged for Mac, Windows
>> and *nix
>> > > > > > > platforms.
>> > > > > > >
>> > > > > > > The current plan is to deploy separate snapshot jars for each
>> OS
>> > > > through
>> > > > > > > entries in the Gandiva CI matrix and then have a combine step
>> that
>> > > > pulls in
>> > > > > > > each OS specific jar and builds a jar that has all the native
>> > > > libraries.
>> > > > > > > This build/deploy would happen only for commits on master
>> branch and
>> > > > not
>> > > > > > > for PR requests
>> > > > > > >
>> > > > > > > Does the plan sound ok (or) please let us know if there is a
>> better
>> > > > way to
>> > > > > > > achieve the same.
>> > > > > > >
>> > > > > > > If it sounds ok, can someone please help with the following
>> > > > > > > 1. It looks like we only do travis builds and not appveyor for
>> > > > master in
>> > > > > > > arrow. Any reason for this?
>> > > > > > > 2. Even if we did appveyor is there a way to sequence the
>> builds.
>> > > > Like wait
>> > > > > > > for appveyor to complete before kicking off travis? Since we
>> would
>> > > > need the
>> > > > > > > dll to be pre-built.
>> > > > > > > 3. Someone would need to configure the credentials to use for
>> the
>> > > > ossrh
>> > > > > > > deployment. The credentials would need access to deploy to
>> > > > org.apache.arrow.
>> > > > > > >
>> > > > > > > Thanks ahead!
>> > > >
>>
>