osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Timeline for Arrow 0.12.0 release


Cool. I will continue to add primitive operations but I am now adding this
in a separate source file to keep it separate from the core array code.

I'm not sure how important it will be to support Rust data sources with
Gandiva. I can see that each language should be able to construct the
logical query plan to submit to Gandiva and let Gandiva handle execution. I
think the more interesting part is how do we support language-specific
lambda functions as part of that logical query plan. Maybe it is possible
to compile the lambda down to LLVM (I haven't started learning about LLVM
in detail yet so this is wild speculation on my part). Another option is
for Gandiva to support calling into shared libraries and that maybe is
simpler for languages that support building C-native shared libraries (Rust
supports this with zero overhead).

Andy.




On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> hi Andy,
>
> I can see an argument for having some basic native function kernel
> support in Rust. One of the things that Gandiva has begun is a
> Protobuf-based serialized representation representation of projection
> and filter expressions. In the long run I would like to see a more
> complete relational algebra / logical query plan that can be submitted
> for execution. There's complexities, though, such as bridging
> iteration of data sources written in Rust, say, with a query engine
> written in C++. You would need to provide some kind of a callback
> mechanism for the query engine to request the next chunk of a dataset
> to be materialized.
>
> It will be interested to see what contributors will be motivated
> enough to build over the next few years. At the end of the day, Apache
> projects are do-ocracies.
>
> - Wes
> On Fri, Dec 7, 2018 at 6:22 AM Andy Grove <andygrove73@xxxxxxxxx> wrote:
> >
> > I've added one PR to the list (https://github.com/apache/arrow/pull/3119
> )
> > to update the project to use Rust 2018 Edition.
> >
> > I'm also considering removing one PR from the list and would like to get
> > opinions here.
> >
> > I have a PR (https://github.com/apache/arrow/pull/3033) to add some
> basic
> > math and comparison operators to primitive arrays. These are baby steps
> > towards implementing more query execution capabilities such as
> projection,
> > selection, etc but Chao made a good point that other Rust implementations
> > don't have these kind of capabilities and I am now wondering if this is a
> > distraction. We already have Gandiva and the new efforts in Ursa labs and
> > it would probably make more sense to look at having Rust bindings for the
> > query execution capabilities there rather than having a competing (and
> less
> > capable) implementation in Rust.
> >
> > Thoughts?
> >
> > Andy.
> >
> >
> >
> >
> >
> > On Thu, Dec 6, 2018 at 8:42 PM paddy horan <paddyhoran@xxxxxxxxxxx>
> wrote:
> >
> > > Other than Andy’s PR below I’m going to try and find time to work on
> > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.
> > > There is nothing else in the 0.12 backlog for Rust.  It would be nice
> to
> > > get the parquet merge in though.
> > >
> > >
> > >
> > > Paddy
> > >
> > >
> > >
> > > ________________________________
> > > From: Andy Grove <andygrove73@xxxxxxxxx>
> > > Sent: Thursday, December 6, 2018 10:20:48 AM
> > > To: dev@xxxxxxxxxxxxxxxx
> > > Subject: Re: Timeline for Arrow 0.12.0 release
> > >
> > > I have PRs pending for all the Rust issues that I want to get into
> 0.12.0
> > > and would appreciate some reviews so I can go ahead and merge:
> > >
> > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and
> > > ARROW-3881
> > > - add math and comparison operations to primitive arrays)
> > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
> > > process)
> > > https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer)
> > >
> > > With these in place I plan on writing a tutorial for reading a CSV
> file,
> > > performing some operations on primitive arrays and writing the output
> to a
> > > new CSV file.
> > >
> > > I am deferring ARROW-3882 (casting for primitive arrays) to 0.13.0
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Tue, Dec 4, 2018 at 7:57 PM Andy Grove <andygrove73@xxxxxxxxx>
> wrote:
> > >
> > > > I'd love to tackle the three related issues for supporting simple
> > > > math/comparison operations on primitive arrays and casting primitive
> > > arrays
> > > > but since the change to use Rust specialization feature I'm a bit
> stuck
> > > and
> > > > need some assistance applying the math operations to the numeric
> types
> > > and
> > > > not the boolean primitives. I have added a comment to
> > > > https://github.com/apache/arrow/pull/3033 ... if I can get help
> solving
> > > > for this PR then I should be able to handle the others. I'll also do
> some
> > > > research and try and figure this out myself.
> > > >
> > > > Andy.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney <wesmckinn@xxxxxxxxx>
> wrote:
> > > >
> > > >> Andy, Paddy, or other Rust developers -- could you review the 6
> issues
> > > >> in TODO in the 0.12 backlog and either assign them or move them to
> the
> > > >> next release if they aren't going to be completed this week or next?
> > > >>
> > > >>
> > > >> On Fri, Nov 30, 2018 at 4:34 PM Wes McKinney <wesmckinn@xxxxxxxxx>
> > > wrote:
> > > >> >
> > > >> > hi folks,
> > > >> >
> > > >> > Tomorrow is December 1. The last major Arrow release (0.11.0) took
> > > >> > place on October 8. Given how much work has happened in the
> project in
> > > >> > the last ~2 months, I think it would be great to complete the next
> > > >> > major release before the end-of-year holidays set in.
> > > >> >
> > > >> > I've been curating the JIRA backlog the last couple of weeks, and
> have
> > > >> > just created a 0.12.0 release wiki page to help us stay organized
> > > >> >
> > > >> >
> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.12.0+Release
> > > >> >
> > > >> > Given that there are only 3 full working weeks between now and
> > > >> > Christmas, I think we should be in position to cut a release by
> the
> > > >> > end of the week of December 10, i.e. by Friday December 14. Not
> all of
> > > >> > the TODO issues have to be completed to make the release, but it
> would
> > > >> > be good to push to complete as much as possible. Please help by
> > > >> > reviewing the backlog, and if possible, assigning issues to
> yourself
> > > >> > that you'd like to pursue in the next 2 weeks.
> > > >> >
> > > >> > Let me know if this sounds reasonable, or any concerns.
> > > >> >
> > > >> > Thanks
> > > >> > Wes
> > > >>
> > > >
> > >
>