OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Progress on Arrow RPC a.k.a. Arrow Flight


Correct, I'm maintaining standard protobuf encoding so a consumer that
doesn't go byte by byte can still consumer/produce the messages.

More impls: for sure.

On Wed, May 30, 2018 at 9:01 AM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> I see; looking more closely I see you've sidestepped the standard
> Protobuf serialization to write the stream as tagged components:
>
> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-
> 02cfc9235e22653fce8a7636c9f95507R241
>
> and then reading the fields of the message tag by tag
>
> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-
> 02cfc9235e22653fce8a7636c9f95507R159
>
> Would it be correct that if a GRPC implementation doesn't provide
> sufficient access to the byte stream (or if it doesn't care enough
> about zero copy) that you could allow GRPC to return an instance of
> the FlightData structure?
>
> I expect we'd want to see a few interoperable implementations (I
> suggest Java, C++, Go) to harden the fine details.
>
> - Wes
>
> On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacques@xxxxxxxxxx>
> wrote:
> > Cutting through the layers of GRPC will be a per language approach thing.
> > Assuming that each GRPC language implementation does a good job of
> > separating message encapsulation from the base library, this should be
> > straight-forward-ish. Hope improves around this as I see creation of
> > non-protobuf protocols built on top of the base GRPC [1]. How to do this
> in
> > each language will probably take time looking at the GRPC internals for
> > that language but can be a secondary step once you get the protocol
> working
> > (you can just pay for extra copies until then).
> >
> > In my Java approach I believe I do one read copy and zero write copies
> > (needs more testing) which was my target. (Getting to zero-copy on read
> > means a lot more complexity because your socket-reading has to be
> protocol
> > aware: even our bespoke layer in Dremio doesn't try to do that. I'd guess
> > KRPC does the same but haven't reviewed the code to confirm.)
> >
> > Will try to get some more slides/readme and a proper proposed patch up
> soon.
> >
> > [1] https://grpc.io/blog/flatbuffers
> >
> >
> >
> > On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmckinn@xxxxxxxxx>
> wrote:
> >
> >> hey Jacques,
> >>
> >> This is great news, I look forward to digging into this. My biggest
> >> initial question is the Protobuf encapsulation, specifically:
> >>
> >> https://github.com/jacques-n/arrow/blob/flight/java/flight/
> >> src/main/protobuf/flight.proto#L99
> >>
> >> My understanding of Protocol Buffers is that on read, the "data_body"
> >> memory would be copied out of the serialized protobuf that came across
> >> the wire. Your comment in the .proto says this "comes last in the
> >> definition to help with sidecar patterns" -- my read is that it would
> >> be up to us to do our own sidecar implementation, similar to how
> >> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
> >> comment there describes pretty much exactly the problem we have). I
> >> saw that you also replied on a GRPC thread about this issue [2]. Could
> >> you summarize what (if anything) stands in the way to get zero-copy on
> >> write and read?
> >>
> >> - Wes
> >>
> >> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/
> >> rpc_sidecar.h#L34
> >> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-
> 391692087
> >>
> >> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacques@xxxxxxxxxx>
> >> wrote:
> >> > FYI, if you want to see an example server you can run with a GRPC
> >> generated
> >> > client, you can run the ExampleFlightServer located at [1]. Very basic
> >> > 'test' with that class and client is located at [2].
> >> >
> >> > [1]
> >> > https://github.com/jacques-n/arrow/tree/flight/java/flight/
> >> src/main/java/org/apache/arrow/flight/example
> >> > [2]
> >> > https://github.com/jacques-n/arrow/blob/flight/java/flight/
> >> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
> >> >
> >> >
> >> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <jacques@xxxxxxxxxx>
> >> wrote:
> >> >
> >> >> Hey All,
> >> >>
> >> >> I used my Strata talk today as a forcing function to make additional
> >> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it
> “Apache
> >> >> Arrow Flight”. You can take a look at the work here [2]. I’ll work to
> >> clean
> >> >> up my work and explain my thoughts about the protocol in the coming
> >> days.
> >> >> High-level: use protobuf as a encapsulation format so that any client
> >> that
> >> >> is supported in GRPC will work. However, we can optimize the
> read/write
> >> >> path for targeted languages and hand control the
> >> >> serialization/deserialization and memory handling. (I did that in
> this
> >> Java
> >> >> patch [3][4][5].) I also looked at starting to use GRPC generated
> >> bindings
> >> >> within Python but it looks like some glue code may be needed in the
> C++
> >> >> layer since Python delegates down frequently. I also am still trying
> to
> >> >> understand GRPC back-pressure patterns and whether the protocol
> >> >> realistically needs to change to cover real-world high performance
> use
> >> >> cases.
> >> >>
> >> >> I’ll send out some slides about the ideas and update README, etc.
> soon.
> >> >>
> >> >> Thanks,
> >> >> Jacques
> >> >>
> >> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
> >> >> src/main/protobuf/flight.proto
> >> >> [2] http://github.com/jacques-n/arrow/
> >> >> [3] https://github.com/jacques-n/arrow/tree/flight/
> >> >> java/flight/src/main/java/org/apache/arrow/flight/grpc
> >> >> [4] https://github.com/jacques-n/arrow/blob/flight/
> >> >> java/flight/src/main/java/org/apache/arrow/flight/
> >> ArrowMessage.java#L253
> >> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/
> >> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
> >> >> [5] https://github.com/jacques-n/arrow/blob/flight/
> >> >> java/flight/src/main/java/org/apache/arrow/flight/
> >> ArrowMessage.java#L185
> >> >>
> >> >>
> >>
>