[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Progress on Arrow RPC a.k.a. Arrow Flight

hey Jacques,

This is great news, I look forward to digging into this. My biggest
initial question is the Protobuf encapsulation, specifically:


My understanding of Protocol Buffers is that on read, the "data_body"
memory would be copied out of the serialized protobuf that came across
the wire. Your comment in the .proto says this "comes last in the
definition to help with sidecar patterns" -- my read is that it would
be up to us to do our own sidecar implementation, similar to how
Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
comment there describes pretty much exactly the problem we have). I
saw that you also replied on a GRPC thread about this issue [2]. Could
you summarize what (if anything) stands in the way to get zero-copy on
write and read?

- Wes

[1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/rpc_sidecar.h#L34
[2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-391692087

On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacques@xxxxxxxxxx> wrote:
> FYI, if you want to see an example server you can run with a GRPC generated
> client, you can run the ExampleFlightServer located at [1]. Very basic
> 'test' with that class and client is located at [2].
> [1]
> https://github.com/jacques-n/arrow/tree/flight/java/flight/src/main/java/org/apache/arrow/flight/example
> [2]
> https://github.com/jacques-n/arrow/blob/flight/java/flight/src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
> On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <jacques@xxxxxxxxxx> wrote:
>> Hey All,
>> I used my Strata talk today as a forcing function to make additional
>> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it “Apache
>> Arrow Flight”. You can take a look at the work here [2]. I’ll work to clean
>> up my work and explain my thoughts about the protocol in the coming days.
>> High-level: use protobuf as a encapsulation format so that any client that
>> is supported in GRPC will work. However, we can optimize the read/write
>> path for targeted languages and hand control the
>> serialization/deserialization and memory handling. (I did that in this Java
>> patch [3][4][5].) I also looked at starting to use GRPC generated bindings
>> within Python but it looks like some glue code may be needed in the C++
>> layer since Python delegates down frequently. I also am still trying to
>> understand GRPC back-pressure patterns and whether the protocol
>> realistically needs to change to cover real-world high performance use
>> cases.
>> I’ll send out some slides about the ideas and update README, etc. soon.
>> Thanks,
>> Jacques
>> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> src/main/protobuf/flight.proto
>> [2] http://github.com/jacques-n/arrow/
>> [3] https://github.com/jacques-n/arrow/tree/flight/
>> java/flight/src/main/java/org/apache/arrow/flight/grpc
>> [4] https://github.com/jacques-n/arrow/blob/flight/
>> java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253
>> <https://github.com/jacques-n/arrow/blob/flight/java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
>> [5] https://github.com/jacques-n/arrow/blob/flight/
>> java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java#L185