[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Portable wordcount on Flink runner broken

I was after the same issue (I was using reference runner job server, but same error message), had some clue but no conclusion yet.   

By retaining the container instance, error message says "bad MD5" (see the other thread [1] I asked in dev last week). My hypothesis, based on the symptoms, is that the underlying container expects an MD5 to validate staged files, but job request from python SDK does not send file hash code.  Hope someone can confirm if that is the case (I am still trying to understand how come dataflow does not have such issue), and if so, the best way to fix it.

[1] https://lists.apache.org/thread.html/b26560087ff88f142e26d66c8a5a9283558c8e55b5edd705b5e53c9c@%3Cdev.beam.apache.org%3E

On Fri, Nov 16, 2018 at 7:06 PM Thomas Weise <thw@xxxxxxxxxx> wrote:
Since last few days, the steps under https://beam.apache.org/roadmap/portability/#python-on-flink are broken.

The gradle task hangs because the job server isn't able to launch the docker container. 

./gradlew :beam-sdks-python:portableWordCount -PjobEndpoint=localhost:8099

[CHAIN MapPartition (MapPartition at 36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0) -> FlatMap (FlatMap at 36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0/out.0) (8/8)] INFO org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory - Still waiting for startup of environment tweise-docker-apache.bintray.io/beam/python:latest for worker id 1

Unfortunately this isn't covered by tests yet. Is anyone aware what change may have caused this or looking into resolving it?


Ruoyun  Huang