osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Portable wordcount on Flink runner broken


Thanks for investing, Thomas!

Ruoyun, does that solve the WordCount problem you were experiencing?

-Max

On 19.11.18 04:53, Thomas Weise wrote:
With latest master the problem seems fixed. Unfortunately that was first masked by build and docker issues. But I changed multiple things at once after getting nowhere (the container build "succeeded" when in fact it did not):

* Update to latest docker
* Increase docker disk space after seeing a spurious, non-reproducible message in one of the build attempts
* Full clean and manually remove Go build residuals from the workspace

After that I could see Go and container builds execute differently (longer build time) and the result certainly looks better..

HTH,
Thomas



On Sun, Nov 18, 2018 at 2:11 PM Ruoyun Huang <ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>> wrote:

    I was after the same issue (I was using reference runner job server,
    but same error message), had some clue but no conclusion yet.

    By retaining the container instance, error message says "bad MD5"
    (see the other thread [1] I asked in dev last week). My hypothesis,
    based on the symptoms, is that the underlying container expects an
    MD5 to validate staged files, but job request from python SDK does
    not send file hash code.  Hope someone can confirm if that is the
    case (I am still trying to understand how come dataflow does not
    have such issue), and if so, the best way to fix it.


    [1]
    https://lists.apache.org/thread.html/b26560087ff88f142e26d66c8a5a9283558c8e55b5edd705b5e53c9c@%3Cdev.beam.apache.org%3E

    On Fri, Nov 16, 2018 at 7:06 PM Thomas Weise <thw@xxxxxxxxxx
    <mailto:thw@xxxxxxxxxx>> wrote:

        Since last few days, the steps under
        https://beam.apache.org/roadmap/portability/#python-on-flink are
        broken.

        The gradle task hangs because the job server isn't able to
        launch the docker container.

        ./gradlew :beam-sdks-python:portableWordCount
        -PjobEndpoint=localhost:8099

        [CHAIN MapPartition (MapPartition at
        36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0) ->
        FlatMap (FlatMap at
        36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0/out.0)
        (8/8)] INFO
        org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory
        - Still waiting for startup of environment
        tweise-docker-apache.bintray.io/beam/python:latest
        <http://tweise-docker-apache.bintray.io/beam/python:latest> for
        worker id 1

        Unfortunately this isn't covered by tests yet. Is anyone aware
        what change may have caused this or looking into resolving it?

        Thanks,
        Thomas



-- ================
    Ruoyun  Huang