Re: [JAVA] Supporting zero copy arrow-vector
It is on purpose that the ArrowBuf is final. It is done to ensure a single
impl and performance reasons. ArrowBuf is primarily a memory address and a
length and wants zero indirection to the reading/writing of that.
It does, however, wrap several types of substructures as long as they have
that property. For example, an ArrowBuf almost always currently wraps a
Netty UnsafeDirectLittleEndian object. At that level you could propose a
way to wrap more types of memory addresses+lengths.
On Thu, Sep 6, 2018, 10:26 PM Zhenyuan Zhao <zzymtn@xxxxxxxxx> wrote:
> Hello Team,
> I'm working on using arrow as intermediate format for transferring columnar
> data from server to client. In this case, the client will only need to read
> from the format so I would like to avoid any unnecessary copy of the data.
> Looking into arrow, while arrow-format/flatbuffers does support zero copy,
> current arrow-vector java implementation is not. I was trying to hack zero
> copy for readonly scenarios, but saw two main blockers:
> ArrowBuf is the only buffer implementation used exclusively across
> ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there
> isn't a
> way for me to override its logic in order to wrap some existing buffer.
> It's absolutely necessary to use ArrowBuf for write scenarios due to
> allocation, but for read, I was hoping vector can just serve as view on
> of existing memory buffer (like java ByteBuffer or netty ByteBuf). Seems
> safe for read only case.
> As a result of #1 <https://github.com/apache/arrow/pull/1> described
> above, the only layer which seems reusable is the arrow-format. Then I
> to implement effectively a readonly copy of arrow-vector that references
> existing buffer. Put aside the effort doing that, it introduces a big
> to keep up with future changes/fixes made to arrow-vector.
> Wondering if you guys have put any thoughts into such readonly scenarios.
> Any suggestion how I can approach this myself?