osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gandiva snapshot releases


Hi Kristian/Wes,

Can you please advise on the deploy tokens. Also do you want to include the
arrow jars in the snapshot deploy?

Thx.

On Fri, Oct 12, 2018 at 11:50 AM Praveen Kumar <praveen@xxxxxxxxxx> wrote:

> Hi Kristian,
>
> Thanks for reviewing.
>
> Yup that is our plan too, we are targeting the ubuntu release first. We
> will pick the mac and the combiner as required later.
>
> For the frequency of deployments, we would be doing at-least once a day
> with the flexibility to manually trigger too.
>
> Thx.
>
> On Thu, Oct 11, 2018 at 9:41 PM Krisztián Szűcs <szucs.krisztian@xxxxxxxxx>
> wrote:
>
>> On Thu, Oct 11, 2018 at 12:58 PM Praveen Kumar <praveen@xxxxxxxxxx>
>> wrote:
>>
>> > Hi All,
>> >
>> > I spent some time today understanding cross bow and it looks great!
>> >
>> > To unblock ourselves immediately, we are going to do the ubuntu deploy
>> > first, followed by the mac deploy and the fat jar deployment.
>> >
>> > To confirm our understanding we would be doing the following
>> >
>> > 1. Create a queue repo similar to one here(
>> > https://github.com/praveenbingo/crossbow) but under dremio org.
>> >
>> Correct, although We might want a centralized crossbow repo to deploy
>> scheduled (e.g. nightly) packages.
>>
>> > 2. Have the repo kick off crossbow builds for each OS that we would
>> want.
>> >
>> Correct. To run the tasks: `python crossbow.py submit gandiva-osx
>> gandiva-ubuntu`
>> It returns the build identifier, e.g. `build-123`
>>
>> > 3. In addition to OS builds, there would be another build which would
>> just
>> > be waiting for the OS builds to finish (with some timeout) and once done
>> > will package the fat jar and deploy to maven.
>> >
>> Basically yes, but depending on the build times it might worth building
>> the
>> fat jar
>> locally instead (of course You can trigger another task which does the
>> same
>> thing
>> just remotely). Currently the artifact downloading is built in the `sign`
>> command,
>> but we can quickly factor that out: `python crossbow.py sign build-123`
>>
>> I'd like to generalize task dependencies, but this is definitely the
>> quickest to start with.
>>
>> >
>> > The only thing that i am unclear of is the maven deploy tokens. Since i
>> am
>> > not a committer with permissions to push to maven repo, I would need
>> keys
>> > to be configured in the dremio/crossbow environment variables.
>> >
>> How often do We want to ship fat jars?
>>
>> >
>> > Wes - do Siddharth/Jacques have permissions to push to maven repo and
>> can i
>> > use the same?
>> >
>> > Also looks like the release scripts here
>> > <https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh>
>> > would need to be changed as well if we want to deploy the fat jar as
>> part
>> > of releases.
>> >
>> Correct.
>>
>> >
>> > Kristian - can you please review the proposed steps and let me know if
>> they
>> > look correct to you?
>> >
>>  Absolutely!
>>
>> BTW if You want to unblock yourself first, then it's enough to have a
>> single task which
>> builds the ubuntu libs and the fat jar (in a single CI build), and We can
>> handle the
>> dependent task (fat jar building) after We introduce another child (mac or
>> win). So We
>> could spare the third step in the first iteration.
>>
>> >
>> > Thx.
>> >
>> >
>> > On Wed, Oct 10, 2018 at 11:33 PM Praveen Kumar <praveen@xxxxxxxxxx>
>> wrote:
>> >
>> > > Hi Wes,
>> > >
>> > > I'll take this to completion. Will send out a proposal tomorrow.
>> > >
>> > > Thx.
>> > >
>> > > On Wed, Oct 10, 2018, 23:32 Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>> > >
>> > >> hi folks,
>> > >>
>> > >> How would you like to proceed on this? I'm tracking many projects
>> > >> right now so I want to make sure someone else is "in charge" on this
>> > >> effort
>> > >>
>> > >> Thanks,
>> > >> Wes
>> > >> On Sat, Oct 6, 2018 at 10:37 AM Wes McKinney <wesmckinn@xxxxxxxxx>
>> > wrote:
>> > >> >
>> > >> > > We could create a worker pool like abstraction where the workers
>> are
>> > >> the CI services, but that would require a scheduler to poll the
>> finished
>> > >> jobs then submit the dependent ones. This sounds a bit inconvenient,
>> > where
>> > >> would that scheduler run: locally, on a CI or self hosted?
>> > >> >
>> > >> > Inevitably we're going to need to build some kind of job scheduler,
>> > >> > whether it uses Airflow or Luigi or some other tool of our own
>> > >> > devising.
>> > >> >
>> > >> > Apache Arrow is eventually going to need a host where we can manage
>> > >> > such workflows. I'm looking into the possibility of a physical
>> > >> > CUDA-equipped host that could be made available to Arrow
>> developers to
>> > >> > use for testing and benchmarking. I may need to run the machine
>> out of
>> > >> > my home (we did something similar for pandas -- physical machine
>> that
>> > >> > we can SSH into).
>> > >> >
>> > >> > All this idealism aside -- we take the shortest path possible for
>> this
>> > >> > particular packaging job, and make improvements as we can going
>> > >> > forward.
>> > >> > On Sat, Oct 6, 2018 at 9:31 AM Krisztián Szűcs
>> > >> > <szucs.krisztian@xxxxxxxxx> wrote:
>> > >> > >
>> > >> > > I see now, so the jar would contain all of the three shared
>> > libraries.
>> > >> > >
>> > >> > > We could create a worker pool like abstraction where the workers
>> are
>> > >> the
>> > >> > > CI services, but that would require a scheduler to poll the
>> finished
>> > >> jobs
>> > >> > > then
>> > >> > > submit the dependent ones. This sounds a bit inconvenient, where
>> > would
>> > >> > > that scheduler run: locally, on a CI or self hosted?
>> > >> > >
>> > >> > > Another approach would be to use the worker the schedule the next
>> > >> task,
>> > >> > > in a similar fashion like dask's worker_client [1] launches tasks
>> > from
>> > >> > > tasks.
>> > >> > > There could be synchronization problems though. This approach
>> > requires
>> > >> > > to bootstrap crossbow on each CI jobs but that would:
>> > >> > > - make crossbow less CI dependent (to use azure pipelines as
>> well)
>> > >> > > - unify the artifact uploading and downloading logic which is
>> > >> required in
>> > >> > > order
>> > >> > >   to support dependent tasks
>> > >> > > - way less redundancy in task definitions
>> > >> > >
>> > >> > > What do You think? I'd prefer the second one.
>> > >> > >
>> > >> > > [1]
>> > >> > >
>> > >>
>> >
>> https://github.com/dask/distributed/blob/master/docs/source/task-launch.rst
>> > >> > >
>> > >> > > On Sat, Oct 6, 2018 at 10:57 AM Wes McKinney <
>> wesmckinn@xxxxxxxxx>
>> > >> wrote:
>> > >> > >
>> > >> > > > It seems the complicated part of this will be having a
>> dependent
>> > >> task
>> > >> > > > that packages up the 3 shared libraries, one for each platform,
>> > >> after
>> > >> > > > the individual packaging tasks are run. How would you propose
>> > >> handling
>> > >> > > > that?
>> > >> > > > On Fri, Oct 5, 2018 at 8:03 AM Krisztián Szűcs
>> > >> > > > <szucs.krisztian@xxxxxxxxx> wrote:
>> > >> > > > >
>> > >> > > > > Ohh, just read the thread, sorry!
>> > >> > > > >
>> > >> > > > > So crossbow is located here
>> > >> > > > https://github.com/apache/arrow/tree/master/dev/tasks
>> > >> > > > > I suggest to "fork" the python-wheels directory which
>> contains
>> > >> three
>> > >> > > > templated ymls
>> > >> > > > > for osx, win and linux builds. For building on linux
>> something
>> > >> like the
>> > >> > > > following should
>> > >> > > > > be sufficient
>> > >> > > >
>> https://gist.github.com/kszucs/39154876d60c4109ff59b678afd65b19
>> > >> > > > > Then You need another entry in the tasks.yml, for example:
>> > >> > > > > jar-gandiva-linux:
>> > >> > > > > platform: linux
>> > >> > > > > template: gandiva-jars/travis.linux.yml
>> > >> > > > > params:
>> > >> > > > > # arbitrary params which are available from the templated yml
>> > >> > > > > ...
>> > >> > > > > artifacts:
>> > >> > > > > # these are the expected artifacts from the build
>> > >> > > > > - gandiva-SNAPSHOT-{version}.jar
>> > >> > > > > ...
>> > >> > > > >
>> > >> > > > > Of course crossbow is wired towards the current packaging
>> > >> requirements,
>> > >> > > > so likely
>> > >> > > > > We need to adjust it to the newly appearing requirements.
>> > >> > > > >
>> > >> > > > > Feel free to reach me on gitter @kszucs.
>> > >> > > > > On Oct 4 2018, at 2:02 pm, Wes McKinney <wesmckinn@xxxxxxxxx
>> >
>> > >> wrote:
>> > >> > > > > >
>> > >> > > > > > hi Praveen,
>> > >> > > > > > Probably the best way to accomplish this is to use our new
>> > >> Crossbow
>> > >> > > > > > infrastructure for task automation on Travis CI and
>> Appveyor
>> > >> rather
>> > >> > > > > > than trying to do all of this within the CI entries. This
>> is
>> > >> how we
>> > >> > > > > > are producing all of our binary artifacts for releases now
>> --
>> > >> > > > > > presumably in future ASF releases, we will want to include
>> a
>> > >> > > > > > platform-independent Gandiva JAR in our release votes, so
>> this
>> > >> all
>> > >> > > > > > needs to end up in Crossbow anyway. The intent is for the
>> > >> Crossbow
>> > >> > > > > > system to take on responsibility for all packaging
>> automation
>> > >> rather
>> > >> > > > > > than using the normal CI for that.
>> > >> > > > > >
>> > >> > > > > > Krisztian, do you have time to help Praveen and the Gandiva
>> > >> crew with
>> > >> > > > > > this project? This will be an important test to document
>> and
>> > >> improve
>> > >> > > > > > Crossbow for such use cases
>> > >> > > > > >
>> > >> > > > > > Thanks
>> > >> > > > > > Wes
>> > >> > > > > > On Thu, Oct 4, 2018 at 7:14 AM Praveen Kumar <
>> > >> praveen@xxxxxxxxxx>
>> > >> > > > wrote:
>> > >> > > > > > >
>> > >> > > > > > > Hi Folks,
>> > >> > > > > > > As part of
>> https://issues.apache.org/jira/browse/ARROW-3385
>> > ,
>> > >> we are
>> > >> > > > > > > planning to perform a snapshot release of the Gandiva
>> Jar on
>> > >> each
>> > >> > > > commit to
>> > >> > > > > > > master. This would be a platform independent jar that
>> > >> contains the
>> > >> > > > core
>> > >> > > > > > > gandiva library and its jni bridge packaged for Mac,
>> Windows
>> > >> and *nix
>> > >> > > > > > > platforms.
>> > >> > > > > > >
>> > >> > > > > > > The current plan is to deploy separate snapshot jars for
>> > each
>> > >> OS
>> > >> > > > through
>> > >> > > > > > > entries in the Gandiva CI matrix and then have a combine
>> > step
>> > >> that
>> > >> > > > pulls in
>> > >> > > > > > > each OS specific jar and builds a jar that has all the
>> > native
>> > >> > > > libraries.
>> > >> > > > > > > This build/deploy would happen only for commits on master
>> > >> branch and
>> > >> > > > not
>> > >> > > > > > > for PR requests
>> > >> > > > > > >
>> > >> > > > > > > Does the plan sound ok (or) please let us know if there
>> is a
>> > >> better
>> > >> > > > way to
>> > >> > > > > > > achieve the same.
>> > >> > > > > > >
>> > >> > > > > > > If it sounds ok, can someone please help with the
>> following
>> > >> > > > > > > 1. It looks like we only do travis builds and not
>> appveyor
>> > for
>> > >> > > > master in
>> > >> > > > > > > arrow. Any reason for this?
>> > >> > > > > > > 2. Even if we did appveyor is there a way to sequence the
>> > >> builds.
>> > >> > > > Like wait
>> > >> > > > > > > for appveyor to complete before kicking off travis?
>> Since we
>> > >> would
>> > >> > > > need the
>> > >> > > > > > > dll to be pre-built.
>> > >> > > > > > > 3. Someone would need to configure the credentials to use
>> > for
>> > >> the
>> > >> > > > ossrh
>> > >> > > > > > > deployment. The credentials would need access to deploy
>> to
>> > >> > > > org.apache.arrow.
>> > >> > > > > > >
>> > >> > > > > > > Thanks ahead!
>> > >> > > >
>> > >>
>> > >
>> >
>>
>