Re: Gandiva Initiative
This is cool, thanks for putting together a prototype!
> it would be great if we could find a good solution to integrate the two projects and build systems
At the moment I'm thinking of Gandiva as analogous to Plasma, a
subcomponent of the C++ codebase to stand alongside the core Arrow
codebase (or it could go in arrow/gandiva, too), so everything would
get built and shipped as a single artifact containing several shared
libraries. Similarly, when the user writes "pip install pyarrow", they
would receive all of the libraries including Gandiva ready to go. It
looks like this is basically already what you've done in your PR.
To do that, we would have to conduct an IP clearance to import the
code and then get to refactoring to incorporate the components into
the Arrow codebase. I'll be standing by to help with that effort if
the Gandiva developers wish to go that route.
On Fri, Jun 22, 2018 at 5:22 AM, Philipp Moritz <pcmoritz@xxxxxxxxx> wrote:
> This is really exciting, thanks a lot for sharing!
> In case anybody wants to try this out from Python, I wrote up some Cython
> bindings (very limited so far, but they can already be used to construct
> some computation graphs and do some benchmarks):
> They are developed in the Arrow repo for now, it would be great if we could
> find a good solution to integrate the two projects and build systems
> seamlessly (for example setting up a Cython environment in the Gandiva repo
> in a way that interoperates well with PyArrow would be hard right now).
> -- Philipp.
> On Thu, Jun 21, 2018 at 4:26 PM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>> hi Jacques,
>> This is very exciting! LLVM codegen for Arrow has been on my wishlist
>> since the early days of the project. I always considered it more of a
>> "when" question more than "if".
>> I will take a closer look at the codebase to make some comments, but
>> my biggest initial question is whether we could work to make Gandiva
>> the official community-supported LLVM framework for creating
>> JIT-compiled Arrow kernels. In the Ursa Labs (a new lab I am building
>> to focus 90+% on Apache Arrow development) tech roadmap we discussed
>> the need for a subgraph compiler using LLVM:
>> I would be interesting in getting involved in the project, and I
>> expect in time many others will, as well. An obvious question would be
>> whether you would be interested in donating the project to Apache
>> Arrow and continuing the work there. We would benefit from common
>> build, testing/CI, and packaging/deployment infrastructure. I'm keen
>> to see JIT-powered predicate pushdown in Parquet files, for example.
>> Phillip and I could look into building a Gandiva backend for compiling
>> a subset of expressions originating from Ibis, a lazy-evaluation DSL
>> system with similar API to pandas
>> On Thu, Jun 21, 2018 at 4:13 PM, Dimitri Vorona
>> <email@example.com> wrote:
>> > Hey Jaques,
>> > Great stuff! I'm actually researching the integration of arrow and flight
>> > into a main memory database which also uses LLVM for dynamic query
>> > generation! Excited to have a more detailed look at Gandiva!
>> > Cheers,
>> > Dimitri.
>> > On Thu, Jun 21, 2018, 21:15 Jacques Nadeau <jacques@xxxxxxxxxx> wrote:
>> >> Hey Guys,
>> >> Dremio just open sourced a new framework for processing data in Arrow
>> >> structures , built on top of the Apache Arrow C++ APIs and leveraging
>> >> LLVM (Apache licensed). It also includes Java APIs that leverage the
>> >> Arrow Java libraries. I expect the developers who have been working on
>> >> will introduce themselves soon. To read more about it, take a look at
>> >> Ravindra's blog post (he's the lead developer driving this work): .
>> >> Hopefully people will find this interesting/useful.
>> >> Let us know what you all think!
>> >> thanks,
>> >> Jacques
>> >>  https://github.com/dremio/gandiva
>> >>  https://www.dremio.com/announcing-gandiva-initiative-