OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Progress on Arrow RPC a.k.a. Arrow Flight


I see; looking more closely I see you've sidestepped the standard
Protobuf serialization to write the stream as tagged components:

https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-02cfc9235e22653fce8a7636c9f95507R241

and then reading the fields of the message tag by tag

https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-02cfc9235e22653fce8a7636c9f95507R159

Would it be correct that if a GRPC implementation doesn't provide
sufficient access to the byte stream (or if it doesn't care enough
about zero copy) that you could allow GRPC to return an instance of
the FlightData structure?

I expect we'd want to see a few interoperable implementations (I
suggest Java, C++, Go) to harden the fine details.

- Wes

On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacques@xxxxxxxxxx> wrote:
> Cutting through the layers of GRPC will be a per language approach thing.
> Assuming that each GRPC language implementation does a good job of
> separating message encapsulation from the base library, this should be
> straight-forward-ish. Hope improves around this as I see creation of
> non-protobuf protocols built on top of the base GRPC [1]. How to do this in
> each language will probably take time looking at the GRPC internals for
> that language but can be a secondary step once you get the protocol working
> (you can just pay for extra copies until then).
>
> In my Java approach I believe I do one read copy and zero write copies
> (needs more testing) which was my target. (Getting to zero-copy on read
> means a lot more complexity because your socket-reading has to be protocol
> aware: even our bespoke layer in Dremio doesn't try to do that. I'd guess
> KRPC does the same but haven't reviewed the code to confirm.)
>
> Will try to get some more slides/readme and a proper proposed patch up soon.
>
> [1] https://grpc.io/blog/flatbuffers
>
>
>
> On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>
>> hey Jacques,
>>
>> This is great news, I look forward to digging into this. My biggest
>> initial question is the Protobuf encapsulation, specifically:
>>
>> https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> src/main/protobuf/flight.proto#L99
>>
>> My understanding of Protocol Buffers is that on read, the "data_body"
>> memory would be copied out of the serialized protobuf that came across
>> the wire. Your comment in the .proto says this "comes last in the
>> definition to help with sidecar patterns" -- my read is that it would
>> be up to us to do our own sidecar implementation, similar to how
>> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
>> comment there describes pretty much exactly the problem we have). I
>> saw that you also replied on a GRPC thread about this issue [2]. Could
>> you summarize what (if anything) stands in the way to get zero-copy on
>> write and read?
>>
>> - Wes
>>
>> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/
>> rpc_sidecar.h#L34
>> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-391692087
>>
>> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacques@xxxxxxxxxx>
>> wrote:
>> > FYI, if you want to see an example server you can run with a GRPC
>> generated
>> > client, you can run the ExampleFlightServer located at [1]. Very basic
>> > 'test' with that class and client is located at [2].
>> >
>> > [1]
>> > https://github.com/jacques-n/arrow/tree/flight/java/flight/
>> src/main/java/org/apache/arrow/flight/example
>> > [2]
>> > https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
>> >
>> >
>> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <jacques@xxxxxxxxxx>
>> wrote:
>> >
>> >> Hey All,
>> >>
>> >> I used my Strata talk today as a forcing function to make additional
>> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it “Apache
>> >> Arrow Flight”. You can take a look at the work here [2]. I’ll work to
>> clean
>> >> up my work and explain my thoughts about the protocol in the coming
>> days.
>> >> High-level: use protobuf as a encapsulation format so that any client
>> that
>> >> is supported in GRPC will work. However, we can optimize the read/write
>> >> path for targeted languages and hand control the
>> >> serialization/deserialization and memory handling. (I did that in this
>> Java
>> >> patch [3][4][5].) I also looked at starting to use GRPC generated
>> bindings
>> >> within Python but it looks like some glue code may be needed in the C++
>> >> layer since Python delegates down frequently. I also am still trying to
>> >> understand GRPC back-pressure patterns and whether the protocol
>> >> realistically needs to change to cover real-world high performance use
>> >> cases.
>> >>
>> >> I’ll send out some slides about the ideas and update README, etc. soon.
>> >>
>> >> Thanks,
>> >> Jacques
>> >>
>> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> >> src/main/protobuf/flight.proto
>> >> [2] http://github.com/jacques-n/arrow/
>> >> [3] https://github.com/jacques-n/arrow/tree/flight/
>> >> java/flight/src/main/java/org/apache/arrow/flight/grpc
>> >> [4] https://github.com/jacques-n/arrow/blob/flight/
>> >> java/flight/src/main/java/org/apache/arrow/flight/
>> ArrowMessage.java#L253
>> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
>> >> [5] https://github.com/jacques-n/arrow/blob/flight/
>> >> java/flight/src/main/java/org/apache/arrow/flight/
>> ArrowMessage.java#L185
>> >>
>> >>
>>