[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Python SDK worker / portable Flink runner performance improvements

This is really cool news. Pretty awesome to move from the "get it to run" phase to the "get it to run faster" phase of this project.

Streaming testing: In Java there's a synthetic source (GenerateSequence / CountingSource) for testing. Maybe in this case I'd say porting to py is worth it?


On Wed, Oct 17, 2018 at 2:00 PM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
Thanks, this was useful for me since I have been away these past couple of weeks.

On Wed, Oct 17, 2018 at 8:45 AM Thomas Weise <thw@xxxxxxxxxx> wrote:

As you may have noticed, some of the contributors are working on enabling the Python support on Flink. The upcoming 2.8 release is going to include much of the functionality and we are now shifting gears to stability and performance.

There have been some basic fixes already (logging, memory leak) and at this point we see very low throughput in streaming mode. Improvements are in-flight:

There has been discussion and preliminary work to improve support for testing as well (streaming mode). The Python SDK currently doesn't have any (open source) streaming connectors, but we have added a Flink native transform that can be used for testing:

I'm starting this thread here so that it is easier for more folks to get involved and stay in sync.