osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Portable wordcount on Flink runner broken


With latest master the problem seems fixed. Unfortunately that was first masked by build and docker issues. But I changed multiple things at once after getting nowhere (the container build "succeeded" when in fact it did not):

* Update to latest docker
* Increase docker disk space after seeing a spurious, non-reproducible message in one of the build attempts
* Full clean and manually remove Go build residuals from the workspace

After that I could see Go and container builds execute differently (longer build time) and the result certainly looks better..

HTH,
Thomas

 

 

On Sun, Nov 18, 2018 at 2:11 PM Ruoyun Huang <ruoyun@xxxxxxxxxx> wrote:
I was after the same issue (I was using reference runner job server, but same error message), had some clue but no conclusion yet.   

By retaining the container instance, error message says "bad MD5" (see the other thread [1] I asked in dev last week). My hypothesis, based on the symptoms, is that the underlying container expects an MD5 to validate staged files, but job request from python SDK does not send file hash code.  Hope someone can confirm if that is the case (I am still trying to understand how come dataflow does not have such issue), and if so, the best way to fix it.


[1] https://lists.apache.org/thread.html/b26560087ff88f142e26d66c8a5a9283558c8e55b5edd705b5e53c9c@%3Cdev.beam.apache.org%3E

On Fri, Nov 16, 2018 at 7:06 PM Thomas Weise <thw@xxxxxxxxxx> wrote:
Since last few days, the steps under https://beam.apache.org/roadmap/portability/#python-on-flink are broken.

The gradle task hangs because the job server isn't able to launch the docker container. 

./gradlew :beam-sdks-python:portableWordCount -PjobEndpoint=localhost:8099

[CHAIN MapPartition (MapPartition at 36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0) -> FlatMap (FlatMap at 36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0/out.0) (8/8)] INFO org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory - Still waiting for startup of environment tweise-docker-apache.bintray.io/beam/python:latest for worker id 1

Unfortunately this isn't covered by tests yet. Is anyone aware what change may have caused this or looking into resolving it?

Thanks,
Thomas



--
================
Ruoyun  Huang