[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gandiva Initiative


I think JIT-compiling of kernels operating on Arrow data is an important
development path, but just for the record, LLVM doesn't have a stable
C++ API (the API changes at each feature release).  Just something to
keep a mind for the ensuing packaging discussions ;-)

(it also raises interesting questions such as "what happens if a user
wants to use both PyArrow and Numba in a given process, and they don't
target the same LLVM API version")



Le 22/06/2018 à 01:26, Wes McKinney a écrit :
> hi Jacques,
> This is very exciting! LLVM codegen for Arrow has been on my wishlist
> since the early days of the project. I always considered it more of a
> "when" question more than "if".
> I will take a closer look at the codebase to make some comments, but
> my biggest initial question is whether we could work to make Gandiva
> the official community-supported LLVM framework for creating
> JIT-compiled Arrow kernels. In the Ursa Labs (a new lab I am building
> to focus 90+% on Apache Arrow development) tech roadmap we discussed
> the need for a subgraph compiler using LLVM:
> https://ursalabs.org/tech/#subgraph-compilation-code-generation.
> I would be interesting in getting involved in the project, and I
> expect in time many others will, as well. An obvious question would be
> whether you would be interested in donating the project to Apache
> Arrow and continuing the work there. We would benefit from common
> build, testing/CI, and packaging/deployment infrastructure. I'm keen
> to see JIT-powered predicate pushdown in Parquet files, for example.
> Phillip and I could look into building a Gandiva backend for compiling
> a subset of expressions originating from Ibis, a lazy-evaluation DSL
> system with similar API to pandas
> (https://github.com/ibis-project/ibis).
> best
> Wes
> On Thu, Jun 21, 2018 at 4:13 PM, Dimitri Vorona
> <alendit@xxxxxxxxxxxxxx.invalid> wrote:
>> Hey Jaques,
>> Great stuff! I'm actually researching the integration of arrow and flight
>> into a main memory database which also uses LLVM for dynamic query
>> generation! Excited to have a more detailed look at Gandiva!
>> Cheers,
>> Dimitri.
>> On Thu, Jun 21, 2018, 21:15 Jacques Nadeau <jacques@xxxxxxxxxx> wrote:
>>> Hey Guys,
>>> Dremio just open sourced a new framework for processing data in Arrow data
>>> structures [1], built on top of the Apache Arrow C++ APIs and leveraging
>>> LLVM (Apache licensed). It also includes Java APIs that leverage the Apache
>>> Arrow Java libraries. I expect the developers who have been working on this
>>> will introduce themselves soon. To read more about it, take a look at our
>>> Ravindra's blog post (he's the lead developer driving this work): [2].
>>> Hopefully people will find this interesting/useful.
>>> Let us know what you all think!
>>> thanks,
>>> Jacques
>>> [1] https://github.com/dremio/gandiva
>>> [2] https://www.dremio.com/announcing-gandiva-initiative-for-apache-arrow/