[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [JAVA] Supporting zero copy arrow-vector


Interesting, so basically I can still use the public constructor

public ArrowBuf(AtomicInteger refCnt, BufferLedger ledger,
UnsafeDirectLittleEndian byteBuf, BufferManager manager,
ArrowByteBufAllocator alloc, int offset, int length, boolean isEmpty)

Instead, override BufferLedger/UnsafeDirectLittleEndian/BufferManager to
make it reference existing buffer. That is a much more plausible option as
it will reuse the Vectors. All I need is to implement my own deserializer.
Did I get you right?

Thanks

On Fri, Sep 7, 2018 at 7:09 AM Jacques Nadeau <jacques@xxxxxxxxxx> wrote:

> It is on purpose that the ArrowBuf is final. It is done to ensure a single
> impl and performance reasons. ArrowBuf is primarily a memory address and a
> length and wants zero indirection to the reading/writing of that.
>
> It does, however, wrap several types of substructures as long as they have
> that property. For example, an ArrowBuf almost always currently wraps a
> Netty UnsafeDirectLittleEndian object. At that level you could propose a
> way to wrap more types of memory addresses+lengths.
>
> On Thu, Sep 6, 2018, 10:26 PM Zhenyuan Zhao <zzymtn@xxxxxxxxx> wrote:
>
> > Hello Team,
> >
> > I'm working on using arrow as intermediate format for transferring
> columnar
> > data from server to client. In this case, the client will only need to
> read
> > from the format so I would like to avoid any unnecessary copy of the
> data.
> > Looking into arrow, while arrow-format/flatbuffers does support zero
> copy,
> > current arrow-vector java implementation is not. I was trying to hack
> zero
> > copy for readonly scenarios, but saw two main blockers:
> >
> >    1.
> >
> >    ArrowBuf is the only buffer implementation used exclusively across
> >    ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there
> > isn't a
> >    way for me to override its logic in order to wrap some existing
> buffer.
> >    It's absolutely necessary to use ArrowBuf for write scenarios due to
> > buffer
> >    allocation, but for read, I was hoping vector can just serve as view
> on
> > top
> >    of existing memory buffer (like java ByteBuffer or netty ByteBuf).
> Seems
> >    safe for read only case.
> >    2.
> >
> >    As a result of #1 <https://github.com/apache/arrow/pull/1> described
> >    above, the only layer which seems reusable is the arrow-format. Then I
> > have
> >    to implement effectively a readonly copy of arrow-vector that
> references
> >    existing buffer. Put aside the effort doing that, it introduces a big
> > gap
> >    to keep up with future changes/fixes made to arrow-vector.
> >
> > Wondering if you guys have put any thoughts into such readonly scenarios.
> > Any suggestion how I can approach this myself?
> >
> > Thanks
> >
>