[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [JAVA] Supporting zero copy arrow-vector

It is on purpose that the ArrowBuf is final. It is done to ensure a single
impl and performance reasons. ArrowBuf is primarily a memory address and a
length and wants zero indirection to the reading/writing of that.

It does, however, wrap several types of substructures as long as they have
that property. For example, an ArrowBuf almost always currently wraps a
Netty UnsafeDirectLittleEndian object. At that level you could propose a
way to wrap more types of memory addresses+lengths.

On Thu, Sep 6, 2018, 10:26 PM Zhenyuan Zhao <zzymtn@xxxxxxxxx> wrote:

> Hello Team,
> I'm working on using arrow as intermediate format for transferring columnar
> data from server to client. In this case, the client will only need to read
> from the format so I would like to avoid any unnecessary copy of the data.
> Looking into arrow, while arrow-format/flatbuffers does support zero copy,
> current arrow-vector java implementation is not. I was trying to hack zero
> copy for readonly scenarios, but saw two main blockers:
>    1.
>    ArrowBuf is the only buffer implementation used exclusively across
>    ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there
> isn't a
>    way for me to override its logic in order to wrap some existing buffer.
>    It's absolutely necessary to use ArrowBuf for write scenarios due to
> buffer
>    allocation, but for read, I was hoping vector can just serve as view on
> top
>    of existing memory buffer (like java ByteBuffer or netty ByteBuf). Seems
>    safe for read only case.
>    2.
>    As a result of #1 <https://github.com/apache/arrow/pull/1> described
>    above, the only layer which seems reusable is the arrow-format. Then I
> have
>    to implement effectively a readonly copy of arrow-vector that references
>    existing buffer. Put aside the effort doing that, it introduces a big
> gap
>    to keep up with future changes/fixes made to arrow-vector.
> Wondering if you guys have put any thoughts into such readonly scenarios.
> Any suggestion how I can approach this myself?
> Thanks