[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Refactoring the Rust API

hi Andy,

I gave a read through the Rust implementation. I have never programmed
in Rust (hope to change that someday!), so some of the programming
constructs are lost on me, but I focused on the Arrow columnar
questions. My notes / questions follow


## High level comments

* Do you plan to provide support for memory pools in the same style as C++? I
  see some work already along these lines, but it will impact the API in a
  number of places

## array.rs


* Should len and null_count always return u64, even on 32-bit arch?
* Do you intend to support zero-copy slicing in the way that C++ supports it?
  Probably doesn't need to get done now

L47 (ListArray)

* In the format docs, for List<T>, T can be any type, including other nested
  types. What does this mean here?
* What is the difference between List<T> and ListArray<T> in Rust, can you add
  a comment?


* Are there benefits to having min and max be methods on PrimitiveArray
  vs. top-level functions?


* This Add operation does not account for nulls


* What is a "BufferArray"?


* Maybe call Struct attribute "columns" instead "fields"?
* Same with changing column(i) -> field(i)

## bitmap.rs


* I would expect a bitmap to be initialized with all 0's instead of all 1's, or
  perhaps this should be a flag to do one or the other

## builder.rs

* Similar question as above re: usize vs u64

## datatypes.rs


* It feels like JSON serde for data types, fields, schemas belongs in its own
  module (this thinking might just be influenced by the design of the Java/C++

## list.rs


* Seems like you might want to call this PrimitiveList, since it only supports
  primitive child types. How will lists of arbitrary arrays be handled? (per
  notes above, and per mailing list discussions)
* The inner cells of a list may be null; this interface does not appear to
  support that

## list_builder.rs

* Same comment, should this be PrimitiveListBuilder?
* How to build lists with a nested type child?
* How to append null lists?

On Fri, May 4, 2018 at 6:07 AM, Andy Grove <andygrove73@xxxxxxxxx> wrote:
> Here's my blog post explaining the refactor:
> https://andygrove.io/2018/05/apache-arrow-traits-generics/
> The Reddit thread is going to be here for anyone wanting to see the
> feedback:
> https://www.reddit.com/r/rust/comments/8gy45t/refactoring_apache_arrow_to_use_traits_and/
> Thanks,
> Andy.
> On Wed, May 2, 2018 at 5:10 PM, Andy Grove <andygrove73@xxxxxxxxx> wrote:
>> There was an interesting blog post posted to Reddit a couple days ago that
>> is very relevant to this refactor. The author is building a dataframe
>> library in Rust and started out with an enum to represent arrays and then
>> moved to using generic traits with the enum.
>> https://www.reddit.com/r/rust/comments/8g2274/dataframes_
>> traits_enums_generics_and_dynamic/
>> I don't think that approach would work for us and I'm tempted to write a
>> up a blog post myself explaining the current refactor and why it is needed.
>> I think I'll try and do that this weekend. I'm keen to get a discussion
>> going around the refactor to make sure we don't need to do another refactor
>> in the future.
>> Andy.
>> On Sun, Apr 29, 2018 at 9:59 AM, Andy Grove <andygrove73@xxxxxxxxx> wrote:
>>> So it turns out this refactor isn't as disruptive as I thought and I
>>> mostly have it working already.
>>> The buffer/builder/list types barely change at all other than the fact
>>> that we no longer need all those macros after moving to generics.
>>> It really is only array.rs that is pretty much a rewrite.
>>> Also, in my earlier email I got my dates wrong. I am aiming to have this
>>> PR ready by Monday May 7th. The real test for me is integrating it with
>>> DataFusion to make sure I haven't missed anything.
>>> Here's the branch where I'm working on this:
>>> https://github.com/andygrove/arrow/tree/refactor_rust_api
>>> Thanks,
>>> Andy.
>>> On Sat, Apr 28, 2018 at 2:10 PM, Andy Grove <andygrove73@xxxxxxxxx>
>>> wrote:
>>>> I filed a PR to track this (https://issues.apache.org/jir
>>>> a/browse/ARROW-2521) but thought it was worth raising on the mailing
>>>> list too.
>>>> I am running into limitations now of the way that Array is represented
>>>> as an enum and I am unable to implement List<List<T>> with the current
>>>> design.
>>>> When Krisztian Szucs and I were working on the initial code we had two
>>>> different approaches and we went with this enum approach at the time
>>>> because we weren't able to make the other approach (traits + generics) work.
>>>> Now that I'm further along the Rust learning curve, I can make the trait
>>>> + generic approach work and I'm currently prototyping in a separate repo,
>>>> and it is looking good so far. I have been able to create a struct array
>>>> containing different type fields including List<List<T>>.
>>>> I think I'm ready to start the refactor for real in my fork. We only
>>>> have ~1k LOC so I don't think it will take too long, but because I'm doing
>>>> this in my spare time I am going to estimate that I will have it complete
>>>> in just over one week, aiming for having it complete by 4/30.
>>>> I think it's fine to continue merging small PRs in the meanwhile but I
>>>> think we should hold off any major changes in the coming week.
>>>> Thanks,
>>>> Andy.