OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [JAVA] Supporting zero copy arrow-vector


Thanks. That's crystal clear for me now.

On Fri, Sep 7, 2018 at 1:16 PM Jacques Nadeau <jacques@xxxxxxxxxx> wrote:

> I opened a jira to describe what I think needs to be done here. Check it
> out:
>
> https://issues.apache.org/jira/browse/ARROW-3191
>
>
> On Fri, Sep 7, 2018 at 10:47 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>
> > Seems like you should be able to construct an UnsafeDirectByteBuf from
> > a MappedByteBuffer, and then wrap that with UnsafeDirectLittleEndian
> > to get zero-copy access to a memory map. Does that sound right?
> >
> >
> >
> https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java
> > On Fri, Sep 7, 2018 at 12:46 PM Zhenyuan Zhao <zzymtn@xxxxxxxxx> wrote:
> > >
> > > Interesting, so basically I can still use the public constructor
> > >
> > > public ArrowBuf(AtomicInteger refCnt, BufferLedger ledger,
> > > UnsafeDirectLittleEndian byteBuf, BufferManager manager,
> > > ArrowByteBufAllocator alloc, int offset, int length, boolean isEmpty)
> > >
> > > Instead, override BufferLedger/UnsafeDirectLittleEndian/BufferManager
> to
> > > make it reference existing buffer. That is a much more plausible option
> > as
> > > it will reuse the Vectors. All I need is to implement my own
> > deserializer.
> > > Did I get you right?
> > >
> > > Thanks
> > >
> > > On Fri, Sep 7, 2018 at 7:09 AM Jacques Nadeau <jacques@xxxxxxxxxx>
> > wrote:
> > >
> > > > It is on purpose that the ArrowBuf is final. It is done to ensure a
> > single
> > > > impl and performance reasons. ArrowBuf is primarily a memory address
> > and a
> > > > length and wants zero indirection to the reading/writing of that.
> > > >
> > > > It does, however, wrap several types of substructures as long as they
> > have
> > > > that property. For example, an ArrowBuf almost always currently
> wraps a
> > > > Netty UnsafeDirectLittleEndian object. At that level you could
> propose
> > a
> > > > way to wrap more types of memory addresses+lengths.
> > > >
> > > > On Thu, Sep 6, 2018, 10:26 PM Zhenyuan Zhao <zzymtn@xxxxxxxxx>
> wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > I'm working on using arrow as intermediate format for transferring
> > > > columnar
> > > > > data from server to client. In this case, the client will only need
> > to
> > > > read
> > > > > from the format so I would like to avoid any unnecessary copy of
> the
> > > > data.
> > > > > Looking into arrow, while arrow-format/flatbuffers does support
> zero
> > > > copy,
> > > > > current arrow-vector java implementation is not. I was trying to
> hack
> > > > zero
> > > > > copy for readonly scenarios, but saw two main blockers:
> > > > >
> > > > >    1.
> > > > >
> > > > >    ArrowBuf is the only buffer implementation used exclusively
> across
> > > > >    ArrowReader/ArrowRecordBatch/Vectors. It's final, which means
> > there
> > > > > isn't a
> > > > >    way for me to override its logic in order to wrap some existing
> > > > buffer.
> > > > >    It's absolutely necessary to use ArrowBuf for write scenarios
> due
> > to
> > > > > buffer
> > > > >    allocation, but for read, I was hoping vector can just serve as
> > view
> > > > on
> > > > > top
> > > > >    of existing memory buffer (like java ByteBuffer or netty
> ByteBuf).
> > > > Seems
> > > > >    safe for read only case.
> > > > >    2.
> > > > >
> > > > >    As a result of #1 <https://github.com/apache/arrow/pull/1>
> > described
> > > > >    above, the only layer which seems reusable is the arrow-format.
> > Then I
> > > > > have
> > > > >    to implement effectively a readonly copy of arrow-vector that
> > > > references
> > > > >    existing buffer. Put aside the effort doing that, it introduces
> a
> > big
> > > > > gap
> > > > >    to keep up with future changes/fixes made to arrow-vector.
> > > > >
> > > > > Wondering if you guys have put any thoughts into such readonly
> > scenarios.
> > > > > Any suggestion how I can approach this myself?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> >
>