osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Timeline for Arrow 0.12.0 release


I think we should aim for time-based releases in general (rather than a
specific set of features), but delaying this one sounds good to me.

Regards

Antoine.


Le 12/12/2018 à 01:34, Wes McKinney a écrit :
> hi all,
> 
> I'm looking at the 0.12 backlog and I am not too comfortable with the
> things that would have to be cut to get a release out next week.
> Additionally, not a lot of developers are going to be working the week
> of December 24 because of the Christmas and New Year's holidays, so
> even if we did release, it might not get seen by a lot of people until
> after the New Year.
> 
> Based on this, I would suggest we push to complete as much work as
> possible (from the 0.12 backlog and beyond) by the end of the year,
> and release as soon as possible in 2019. Of course, anyone is welcome
> to contribute work that is not found in the 0.12 milestone =)
> 
> Any objections?
> 
> Thanks
> Wes
> On Mon, Dec 10, 2018 at 8:04 AM Andy Grove <andygrove73@xxxxxxxxx> wrote:
>>
>> Cool. I will continue to add primitive operations but I am now adding this
>> in a separate source file to keep it separate from the core array code.
>>
>> I'm not sure how important it will be to support Rust data sources with
>> Gandiva. I can see that each language should be able to construct the
>> logical query plan to submit to Gandiva and let Gandiva handle execution. I
>> think the more interesting part is how do we support language-specific
>> lambda functions as part of that logical query plan. Maybe it is possible
>> to compile the lambda down to LLVM (I haven't started learning about LLVM
>> in detail yet so this is wild speculation on my part). Another option is
>> for Gandiva to support calling into shared libraries and that maybe is
>> simpler for languages that support building C-native shared libraries (Rust
>> supports this with zero overhead).
>>
>> Andy.
>>
>>
>>
>>
>> On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>>
>>> hi Andy,
>>>
>>> I can see an argument for having some basic native function kernel
>>> support in Rust. One of the things that Gandiva has begun is a
>>> Protobuf-based serialized representation representation of projection
>>> and filter expressions. In the long run I would like to see a more
>>> complete relational algebra / logical query plan that can be submitted
>>> for execution. There's complexities, though, such as bridging
>>> iteration of data sources written in Rust, say, with a query engine
>>> written in C++. You would need to provide some kind of a callback
>>> mechanism for the query engine to request the next chunk of a dataset
>>> to be materialized.
>>>
>>> It will be interested to see what contributors will be motivated
>>> enough to build over the next few years. At the end of the day, Apache
>>> projects are do-ocracies.
>>>
>>> - Wes
>>> On Fri, Dec 7, 2018 at 6:22 AM Andy Grove <andygrove73@xxxxxxxxx> wrote:
>>>>
>>>> I've added one PR to the list (https://github.com/apache/arrow/pull/3119
>>> )
>>>> to update the project to use Rust 2018 Edition.
>>>>
>>>> I'm also considering removing one PR from the list and would like to get
>>>> opinions here.
>>>>
>>>> I have a PR (https://github.com/apache/arrow/pull/3033) to add some
>>> basic
>>>> math and comparison operators to primitive arrays. These are baby steps
>>>> towards implementing more query execution capabilities such as
>>> projection,
>>>> selection, etc but Chao made a good point that other Rust implementations
>>>> don't have these kind of capabilities and I am now wondering if this is a
>>>> distraction. We already have Gandiva and the new efforts in Ursa labs and
>>>> it would probably make more sense to look at having Rust bindings for the
>>>> query execution capabilities there rather than having a competing (and
>>> less
>>>> capable) implementation in Rust.
>>>>
>>>> Thoughts?
>>>>
>>>> Andy.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Dec 6, 2018 at 8:42 PM paddy horan <paddyhoran@xxxxxxxxxxx>
>>> wrote:
>>>>
>>>>> Other than Andy’s PR below I’m going to try and find time to work on
>>>>> ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week.
>>>>> There is nothing else in the 0.12 backlog for Rust.  It would be nice
>>> to
>>>>> get the parquet merge in though.
>>>>>
>>>>>
>>>>>
>>>>> Paddy
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Andy Grove <andygrove73@xxxxxxxxx>
>>>>> Sent: Thursday, December 6, 2018 10:20:48 AM
>>>>> To: dev@xxxxxxxxxxxxxxxx
>>>>> Subject: Re: Timeline for Arrow 0.12.0 release
>>>>>
>>>>> I have PRs pending for all the Rust issues that I want to get into
>>> 0.12.0
>>>>> and would appreciate some reviews so I can go ahead and merge:
>>>>>
>>>>> https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and
>>>>> ARROW-3881
>>>>> - add math and comparison operations to primitive arrays)
>>>>> https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release
>>>>> process)
>>>>> https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer)
>>>>>
>>>>> With these in place I plan on writing a tutorial for reading a CSV
>>> file,
>>>>> performing some operations on primitive arrays and writing the output
>>> to a
>>>>> new CSV file.
>>>>>
>>>>> I am deferring ARROW-3882 (casting for primitive arrays) to 0.13.0
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Andy.
>>>>>
>>>>> On Tue, Dec 4, 2018 at 7:57 PM Andy Grove <andygrove73@xxxxxxxxx>
>>> wrote:
>>>>>
>>>>>> I'd love to tackle the three related issues for supporting simple
>>>>>> math/comparison operations on primitive arrays and casting primitive
>>>>> arrays
>>>>>> but since the change to use Rust specialization feature I'm a bit
>>> stuck
>>>>> and
>>>>>> need some assistance applying the math operations to the numeric
>>> types
>>>>> and
>>>>>> not the boolean primitives. I have added a comment to
>>>>>> https://github.com/apache/arrow/pull/3033 ... if I can get help
>>> solving
>>>>>> for this PR then I should be able to handle the others. I'll also do
>>> some
>>>>>> research and try and figure this out myself.
>>>>>>
>>>>>> Andy.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney <wesmckinn@xxxxxxxxx>
>>> wrote:
>>>>>>
>>>>>>> Andy, Paddy, or other Rust developers -- could you review the 6
>>> issues
>>>>>>> in TODO in the 0.12 backlog and either assign them or move them to
>>> the
>>>>>>> next release if they aren't going to be completed this week or next?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 30, 2018 at 4:34 PM Wes McKinney <wesmckinn@xxxxxxxxx>
>>>>> wrote:
>>>>>>>>
>>>>>>>> hi folks,
>>>>>>>>
>>>>>>>> Tomorrow is December 1. The last major Arrow release (0.11.0) took
>>>>>>>> place on October 8. Given how much work has happened in the
>>> project in
>>>>>>>> the last ~2 months, I think it would be great to complete the next
>>>>>>>> major release before the end-of-year holidays set in.
>>>>>>>>
>>>>>>>> I've been curating the JIRA backlog the last couple of weeks, and
>>> have
>>>>>>>> just created a 0.12.0 release wiki page to help us stay organized
>>>>>>>>
>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.12.0+Release
>>>>>>>>
>>>>>>>> Given that there are only 3 full working weeks between now and
>>>>>>>> Christmas, I think we should be in position to cut a release by
>>> the
>>>>>>>> end of the week of December 10, i.e. by Friday December 14. Not
>>> all of
>>>>>>>> the TODO issues have to be completed to make the release, but it
>>> would
>>>>>>>> be good to push to complete as much as possible. Please help by
>>>>>>>> reviewing the backlog, and if possible, assigning issues to
>>> yourself
>>>>>>>> that you'd like to pursue in the next 2 weeks.
>>>>>>>>
>>>>>>>> Let me know if this sounds reasonable, or any concerns.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wes
>>>>>>>
>>>>>>
>>>>>
>>>