osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Python SDK worker / portable Flink runner performance improvements


Thanks Thomas, I think it is important to start looking at performance and improved test coverage.

While we have the basic functionality, there is still state and timers to be implemented for the Portable FlinkRunner. These two will allow full testing/optimization:

State:  https://issues.apache.org/jira/browse/BEAM-2918 (pending PR)
Timers: https://issues.apache.org/jira/browse/BEAM-4681

-Max

On 17.10.18 22:59, Lukasz Cwik wrote:
Thanks, this was useful for me since I have been away these past couple of weeks.

On Wed, Oct 17, 2018 at 8:45 AM Thomas Weise <thw@xxxxxxxxxx <mailto:thw@xxxxxxxxxx>> wrote:

    Hi,

    As you may have noticed, some of the contributors are working on
    enabling the Python support on Flink. The upcoming 2.8 release is
    going to include much of the functionality and we are now shifting
    gears to stability and performance.

    There have been some basic fixes already (logging, memory leak) and
    at this point we see very low throughput in streaming mode.
    Improvements are in-flight:

    https://issues.apache.org/jira/browse/BEAM-5760
    https://issues.apache.org/jira/browse/BEAM-5521

    There has been discussion and preliminary work to improve support
    for testing as well (streaming mode). The Python SDK currently
    doesn't have any (open source) streaming connectors, but we have
    added a Flink native transform that can be used for testing:

    https://issues.apache.org/jira/browse/BEAM-5707

    I'm starting this thread here so that it is easier for more folks to
    get involved and stay in sync.

    Thanks,
    Thomas