osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gandiva snapshot releases


Hi Kristian,

Thanks for reviewing.

Yup that is our plan too, we are targeting the ubuntu release first. We
will pick the mac and the combiner as required later.

For the frequency of deployments, we would be doing at-least once a day
with the flexibility to manually trigger too.

Thx.

On Thu, Oct 11, 2018 at 9:41 PM Krisztián Szűcs <szucs.krisztian@xxxxxxxxx>
wrote:

> On Thu, Oct 11, 2018 at 12:58 PM Praveen Kumar <praveen@xxxxxxxxxx> wrote:
>
> > Hi All,
> >
> > I spent some time today understanding cross bow and it looks great!
> >
> > To unblock ourselves immediately, we are going to do the ubuntu deploy
> > first, followed by the mac deploy and the fat jar deployment.
> >
> > To confirm our understanding we would be doing the following
> >
> > 1. Create a queue repo similar to one here(
> > https://github.com/praveenbingo/crossbow) but under dremio org.
> >
> Correct, although We might want a centralized crossbow repo to deploy
> scheduled (e.g. nightly) packages.
>
> > 2. Have the repo kick off crossbow builds for each OS that we would want.
> >
> Correct. To run the tasks: `python crossbow.py submit gandiva-osx
> gandiva-ubuntu`
> It returns the build identifier, e.g. `build-123`
>
> > 3. In addition to OS builds, there would be another build which would
> just
> > be waiting for the OS builds to finish (with some timeout) and once done
> > will package the fat jar and deploy to maven.
> >
> Basically yes, but depending on the build times it might worth building the
> fat jar
> locally instead (of course You can trigger another task which does the same
> thing
> just remotely). Currently the artifact downloading is built in the `sign`
> command,
> but we can quickly factor that out: `python crossbow.py sign build-123`
>
> I'd like to generalize task dependencies, but this is definitely the
> quickest to start with.
>
> >
> > The only thing that i am unclear of is the maven deploy tokens. Since i
> am
> > not a committer with permissions to push to maven repo, I would need keys
> > to be configured in the dremio/crossbow environment variables.
> >
> How often do We want to ship fat jars?
>
> >
> > Wes - do Siddharth/Jacques have permissions to push to maven repo and
> can i
> > use the same?
> >
> > Also looks like the release scripts here
> > <https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh>
> > would need to be changed as well if we want to deploy the fat jar as part
> > of releases.
> >
> Correct.
>
> >
> > Kristian - can you please review the proposed steps and let me know if
> they
> > look correct to you?
> >
>  Absolutely!
>
> BTW if You want to unblock yourself first, then it's enough to have a
> single task which
> builds the ubuntu libs and the fat jar (in a single CI build), and We can
> handle the
> dependent task (fat jar building) after We introduce another child (mac or
> win). So We
> could spare the third step in the first iteration.
>
> >
> > Thx.
> >
> >
> > On Wed, Oct 10, 2018 at 11:33 PM Praveen Kumar <praveen@xxxxxxxxxx>
> wrote:
> >
> > > Hi Wes,
> > >
> > > I'll take this to completion. Will send out a proposal tomorrow.
> > >
> > > Thx.
> > >
> > > On Wed, Oct 10, 2018, 23:32 Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
> > >
> > >> hi folks,
> > >>
> > >> How would you like to proceed on this? I'm tracking many projects
> > >> right now so I want to make sure someone else is "in charge" on this
> > >> effort
> > >>
> > >> Thanks,
> > >> Wes
> > >> On Sat, Oct 6, 2018 at 10:37 AM Wes McKinney <wesmckinn@xxxxxxxxx>
> > wrote:
> > >> >
> > >> > > We could create a worker pool like abstraction where the workers
> are
> > >> the CI services, but that would require a scheduler to poll the
> finished
> > >> jobs then submit the dependent ones. This sounds a bit inconvenient,
> > where
> > >> would that scheduler run: locally, on a CI or self hosted?
> > >> >
> > >> > Inevitably we're going to need to build some kind of job scheduler,
> > >> > whether it uses Airflow or Luigi or some other tool of our own
> > >> > devising.
> > >> >
> > >> > Apache Arrow is eventually going to need a host where we can manage
> > >> > such workflows. I'm looking into the possibility of a physical
> > >> > CUDA-equipped host that could be made available to Arrow developers
> to
> > >> > use for testing and benchmarking. I may need to run the machine out
> of
> > >> > my home (we did something similar for pandas -- physical machine
> that
> > >> > we can SSH into).
> > >> >
> > >> > All this idealism aside -- we take the shortest path possible for
> this
> > >> > particular packaging job, and make improvements as we can going
> > >> > forward.
> > >> > On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs
> > >> > <szucs.krisztian@xxxxxxxxx> wrote:
> > >> > >
> > >> > > I see now, so the jar would contain all of the three shared
> > libraries.
> > >> > >
> > >> > > We could create a worker pool like abstraction where the workers
> are
> > >> the
> > >> > > CI services, but that would require a scheduler to poll the
> finished
> > >> jobs
> > >> > > then
> > >> > > submit the dependent ones. This sounds a bit inconvenient, where
> > would
> > >> > > that scheduler run: locally, on a CI or self hosted?
> > >> > >
> > >> > > Another approach would be to use the worker the schedule the next
> > >> task,
> > >> > > in a similar fashion like dask's worker_client [1] launches tasks
> > from
> > >> > > tasks.
> > >> > > There could be synchronization problems though. This approach
> > requires
> > >> > > to bootstrap crossbow on each CI jobs but that would:
> > >> > > - make crossbow less CI dependent (to use azure pipelines as well)
> > >> > > - unify the artifact uploading and downloading logic which is
> > >> required in
> > >> > > order
> > >> > >   to support dependent tasks
> > >> > > - way less redundancy in task definitions
> > >> > >
> > >> > > What do You think? I'd prefer the second one.
> > >> > >
> > >> > > [1]
> > >> > >
> > >>
> >
> https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst
> > >> > >
> > >> > > On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney <wesmckinn@xxxxxxxxx
> >
> > >> wrote:
> > >> > >
> > >> > > > It seems the complicated part of this will be having a dependent
> > >> task
> > >> > > > that packages up the 3 shared libraries, one for each platform,
> > >> after
> > >> > > > the individual packaging tasks are run. How would you propose
> > >> handling
> > >> > > > that?
> > >> > > > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs
> > >> > > > <szucs.krisztian@xxxxxxxxx> wrote:
> > >> > > > >
> > >> > > > > Ohh, just read the thread, sorry!
> > >> > > > >
> > >> > > > > So crossbow is located here
> > >> > > > https://github.com/apache/arrow/tree/master/dev/tasks
> > >> > > > > I suggest to "fork" the python-wheels directory which contains
> > >> three
> > >> > > > templated ymls
> > >> > > > > for osx, win and linux builds. For building on linux something
> > >> like the
> > >> > > > following should
> > >> > > > > be sufficient
> > >> > > > https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19
> > >> > > > > Then You need another entry in the tasks.yml, for example:
> > >> > > > > jar-gandiva-linux:
> > >> > > > > platform: linux
> > >> > > > > template: gandiva-jars/travis.linux.yml
> > >> > > > > params:
> > >> > > > > # arbitrary params which are available from the templated yml
> > >> > > > > ...
> > >> > > > > artifacts:
> > >> > > > > # these are the expected artifacts from the build
> > >> > > > > - gandiva-SNAPSHOT-{version}.jar
> > >> > > > > ...
> > >> > > > >
> > >> > > > > Of course crossbow is wired towards the current packaging
> > >> requirements,
> > >> > > > so likely
> > >> > > > > We need to adjust it to the newly appearing requirements.
> > >> > > > >
> > >> > > > > Feel free to reach me on gitter @kszucs.
> > >> > > > > On Oct 4 2018, at 2:02 pm, Wes McKinney <wesmckinn@xxxxxxxxx>
> > >> wrote:
> > >> > > > > >
> > >> > > > > > hi Praveen,
> > >> > > > > > Probably the best way to accomplish this is to use our new
> > >> Crossbow
> > >> > > > > > infrastructure for task automation on Travis CI and Appveyor
> > >> rather
> > >> > > > > > than trying to do all of this within the CI entries. This is
> > >> how we
> > >> > > > > > are producing all of our binary artifacts for releases now
> --
> > >> > > > > > presumably in future ASF releases, we will want to include a
> > >> > > > > > platform-independent Gandiva JAR in our release votes, so
> this
> > >> all
> > >> > > > > > needs to end up in Crossbow anyway. The intent is for the
> > >> Crossbow
> > >> > > > > > system to take on responsibility for all packaging
> automation
> > >> rather
> > >> > > > > > than using the normal CI for that.
> > >> > > > > >
> > >> > > > > > Krisztian, do you have time to help Praveen and the Gandiva
> > >> crew with
> > >> > > > > > this project? This will be an important test to document and
> > >> improve
> > >> > > > > > Crossbow for such use cases
> > >> > > > > >
> > >> > > > > > Thanks
> > >> > > > > > Wes
> > >> > > > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar <
> > >> praveen@xxxxxxxxxx>
> > >> > > > wrote:
> > >> > > > > > >
> > >> > > > > > > Hi Folks,
> > >> > > > > > > As part of
> https://issues.apache.org/jira/browse/ARROW-3385
> > ,
> > >> we are
> > >> > > > > > > planning to perform a snapshot release of the Gandiva Jar
> on
> > >> each
> > >> > > > commit to
> > >> > > > > > > master. This would be a platform independent jar that
> > >> contains the
> > >> > > > core
> > >> > > > > > > gandiva library and its jni bridge packaged for Mac,
> Windows
> > >> and *nix
> > >> > > > > > > platforms.
> > >> > > > > > >
> > >> > > > > > > The current plan is to deploy separate snapshot jars for
> > each
> > >> OS
> > >> > > > through
> > >> > > > > > > entries in the Gandiva CI matrix and then have a combine
> > step
> > >> that
> > >> > > > pulls in
> > >> > > > > > > each OS specific jar and builds a jar that has all the
> > native
> > >> > > > libraries.
> > >> > > > > > > This build/deploy would happen only for commits on master
> > >> branch and
> > >> > > > not
> > >> > > > > > > for PR requests
> > >> > > > > > >
> > >> > > > > > > Does the plan sound ok (or) please let us know if there
> is a
> > >> better
> > >> > > > way to
> > >> > > > > > > achieve the same.
> > >> > > > > > >
> > >> > > > > > > If it sounds ok, can someone please help with the
> following
> > >> > > > > > > 1. It looks like we only do travis builds and not appveyor
> > for
> > >> > > > master in
> > >> > > > > > > arrow. Any reason for this?
> > >> > > > > > > 2. Even if we did appveyor is there a way to sequence the
> > >> builds.
> > >> > > > Like wait
> > >> > > > > > > for appveyor to complete before kicking off travis? Since
> we
> > >> would
> > >> > > > need the
> > >> > > > > > > dll to be pre-built.
> > >> > > > > > > 3. Someone would need to configure the credentials to use
> > for
> > >> the
> > >> > > > ossrh
> > >> > > > > > > deployment. The credentials would need access to deploy to
> > >> > > > org.apache.arrow.
> > >> > > > > > >
> > >> > > > > > > Thanks ahead!
> > >> > > >
> > >>
> > >
> >
>