Unfortunately, flink server still doesn't work consistently on my machine yet. Funny thing is, it did worked ONCE (:beam-sdks-python:portableWordCount BUILD successful, finished in 18s). When I tried gain, things were back to hanging with server printing messages like:"""[flink-akka.actor.default-dispatcher-25] DEBUG org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received slot report from instance 1ad9060bcc87cf5fd19c9a233c15a18f.[flink-akka.actor.default-dispatcher-25] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.[flink-akka.actor.default-dispatcher-23] DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat request from 006b3653dc7a24471c115d70c4c55fa6.[flink-akka.actor.default-dispatcher-25] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Received heartbeat from e188c32c-cfa5-4b85-bda9-16ce4742c490....repeat above forever after 5 minutes."""I am trying to figure out what I did right for that one time succeeded run.For the step 3 Thomas mentioned, all I did for cleanup is "gradle clean", if there are actually more to do, please kindly let me know.On Mon, Nov 19, 2018 at 6:00 AM Maximilian Michels <mxm@xxxxxxxxxx> wrote:Thanks for investing, Thomas!
Ruoyun, does that solve the WordCount problem you were experiencing?
On 19.11.18 04:53, Thomas Weise wrote:
> With latest master the problem seems fixed. Unfortunately that was first
> masked by build and docker issues. But I changed multiple things at once
> after getting nowhere (the container build "succeeded" when in fact it
> did not):
> * Update to latest docker
> * Increase docker disk space after seeing a spurious, non-reproducible
> message in one of the build attempts
> * Full clean and manually remove Go build residuals from the workspace
> After that I could see Go and container builds execute differently
> (longer build time) and the result certainly looks better..
> On Sun, Nov 18, 2018 at 2:11 PM Ruoyun Huang <ruoyun@xxxxxxxxxx
> <mailto:ruoyun@xxxxxxxxxx>> wrote:
> I was after the same issue (I was using reference runner job server,
> but same error message), had some clue but no conclusion yet.
> By retaining the container instance, error message says "bad MD5"
> (see the other thread  I asked in dev last week). My hypothesis,
> based on the symptoms, is that the underlying container expects an
> MD5 to validate staged files, but job request from python SDK does
> not send file hash code. Hope someone can confirm if that is the
> case (I am still trying to understand how come dataflow does not
> have such issue), and if so, the best way to fix it.
> On Fri, Nov 16, 2018 at 7:06 PM Thomas Weise <thw@xxxxxxxxxx
> <mailto:thw@xxxxxxxxxx>> wrote:
> Since last few days, the steps under
> https://beam.apache.org/roadmap/portability/#python-on-flink are
> The gradle task hangs because the job server isn't able to
> launch the docker container.
> ./gradlew :beam-sdks-python:portableWordCount
> [CHAIN MapPartition (MapPartition at
> 36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0) ->
> FlatMap (FlatMap at
> (8/8)] INFO
> - Still waiting for startup of environment
> <http://tweise-docker-apache.bintray.io/beam/python:latest> for
> worker id 1
> Unfortunately this isn't covered by tests yet. Is anyone aware
> what change may have caused this or looking into resolving it?
> Ruoyun Huang