[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[JAVA] Supporting zero copy arrow-vector

Hello Team,

I'm working on using arrow as intermediate format for transferring columnar
data from server to client. In this case, the client will only need to read
from the format so I would like to avoid any unnecessary copy of the data.
Looking into arrow, while arrow-format/flatbuffers does support zero copy,
current arrow-vector java implementation is not. I was trying to hack zero
copy for readonly scenarios, but saw two main blockers:


   ArrowBuf is the only buffer implementation used exclusively across
   ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there isn't a
   way for me to override its logic in order to wrap some existing buffer.
   It's absolutely necessary to use ArrowBuf for write scenarios due to buffer
   allocation, but for read, I was hoping vector can just serve as view on top
   of existing memory buffer (like java ByteBuffer or netty ByteBuf). Seems
   safe for read only case.

   As a result of #1 <https://github.com/apache/arrow/pull/1> described
   above, the only layer which seems reusable is the arrow-format. Then I have
   to implement effectively a readonly copy of arrow-vector that references
   existing buffer. Put aside the effort doing that, it introduces a big gap
   to keep up with future changes/fixes made to arrow-vector.

Wondering if you guys have put any thoughts into such readonly scenarios.
Any suggestion how I can approach this myself?