[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[JAVA] Supporting zero copy arrow-vector
Hello Team,
I'm working on using arrow as intermediate format for transferring columnar
data from server to client. In this case, the client will only need to read
from the format so I would like to avoid any unnecessary copy of the data.
Looking into arrow, while arrow-format/flatbuffers does support zero copy,
current arrow-vector java implementation is not. I was trying to hack zero
copy for readonly scenarios, but saw two main blockers:
1.
ArrowBuf is the only buffer implementation used exclusively across
ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there isn't a
way for me to override its logic in order to wrap some existing buffer.
It's absolutely necessary to use ArrowBuf for write scenarios due to buffer
allocation, but for read, I was hoping vector can just serve as view on top
of existing memory buffer (like java ByteBuffer or netty ByteBuf). Seems
safe for read only case.
2.
As a result of #1 <https://github.com/apache/arrow/pull/1> described
above, the only layer which seems reusable is the arrow-format. Then I have
to implement effectively a readonly copy of arrow-vector that references
existing buffer. Put aside the effort doing that, it introduces a big gap
to keep up with future changes/fixes made to arrow-vector.
Wondering if you guys have put any thoughts into such readonly scenarios.
Any suggestion how I can approach this myself?
Thanks