OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Progress on Arrow RPC a.k.a. Arrow Flight


Cutting through the layers of GRPC will be a per language approach thing.
Assuming that each GRPC language implementation does a good job of
separating message encapsulation from the base library, this should be
straight-forward-ish. Hope improves around this as I see creation of
non-protobuf protocols built on top of the base GRPC [1]. How to do this in
each language will probably take time looking at the GRPC internals for
that language but can be a secondary step once you get the protocol working
(you can just pay for extra copies until then).

In my Java approach I believe I do one read copy and zero write copies
(needs more testing) which was my target. (Getting to zero-copy on read
means a lot more complexity because your socket-reading has to be protocol
aware: even our bespoke layer in Dremio doesn't try to do that. I'd guess
KRPC does the same but haven't reviewed the code to confirm.)

Will try to get some more slides/readme and a proper proposed patch up soon.

[1] https://grpc.io/blog/flatbuffers



On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> hey Jacques,
>
> This is great news, I look forward to digging into this. My biggest
> initial question is the Protobuf encapsulation, specifically:
>
> https://github.com/jacques-n/arrow/blob/flight/java/flight/
> src/main/protobuf/flight.proto#L99
>
> My understanding of Protocol Buffers is that on read, the "data_body"
> memory would be copied out of the serialized protobuf that came across
> the wire. Your comment in the .proto says this "comes last in the
> definition to help with sidecar patterns" -- my read is that it would
> be up to us to do our own sidecar implementation, similar to how
> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
> comment there describes pretty much exactly the problem we have). I
> saw that you also replied on a GRPC thread about this issue [2]. Could
> you summarize what (if anything) stands in the way to get zero-copy on
> write and read?
>
> - Wes
>
> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/
> rpc_sidecar.h#L34
> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-391692087
>
> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacques@xxxxxxxxxx>
> wrote:
> > FYI, if you want to see an example server you can run with a GRPC
> generated
> > client, you can run the ExampleFlightServer located at [1]. Very basic
> > 'test' with that class and client is located at [2].
> >
> > [1]
> > https://github.com/jacques-n/arrow/tree/flight/java/flight/
> src/main/java/org/apache/arrow/flight/example
> > [2]
> > https://github.com/jacques-n/arrow/blob/flight/java/flight/
> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
> >
> >
> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <jacques@xxxxxxxxxx>
> wrote:
> >
> >> Hey All,
> >>
> >> I used my Strata talk today as a forcing function to make additional
> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it “Apache
> >> Arrow Flight”. You can take a look at the work here [2]. I’ll work to
> clean
> >> up my work and explain my thoughts about the protocol in the coming
> days.
> >> High-level: use protobuf as a encapsulation format so that any client
> that
> >> is supported in GRPC will work. However, we can optimize the read/write
> >> path for targeted languages and hand control the
> >> serialization/deserialization and memory handling. (I did that in this
> Java
> >> patch [3][4][5].) I also looked at starting to use GRPC generated
> bindings
> >> within Python but it looks like some glue code may be needed in the C++
> >> layer since Python delegates down frequently. I also am still trying to
> >> understand GRPC back-pressure patterns and whether the protocol
> >> realistically needs to change to cover real-world high performance use
> >> cases.
> >>
> >> I’ll send out some slides about the ideas and update README, etc. soon.
> >>
> >> Thanks,
> >> Jacques
> >>
> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
> >> src/main/protobuf/flight.proto
> >> [2] http://github.com/jacques-n/arrow/
> >> [3] https://github.com/jacques-n/arrow/tree/flight/
> >> java/flight/src/main/java/org/apache/arrow/flight/grpc
> >> [4] https://github.com/jacques-n/arrow/blob/flight/
> >> java/flight/src/main/java/org/apache/arrow/flight/
> ArrowMessage.java#L253
> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/
> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
> >> [5] https://github.com/jacques-n/arrow/blob/flight/
> >> java/flight/src/main/java/org/apache/arrow/flight/
> ArrowMessage.java#L185
> >>
> >>
>