osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to use "PortableRunner" in Python SDK?


Works for me on macOS as well.

In case you don't launch the pipeline through Gradle, this would be the command:

python -m apache_beam.examples.wordcount \
  --input=/etc/profile \
  --output=/tmp/py-wordcount-direct \
  --runner=PortableRunner \
  --job_endpoint=localhost:8099 \
  --parallelism=1 \
  --OPTIONALflink_master=localhost:8081 \
  --streaming

We talked about adding the wordcount to pre-commit..

Regarding using ULR vs. Flink runner: There seems to be confusion between PortableRunner using the user supplied endpoint vs. trying to launch a job server. I commented in the doc.

Thomas 



On Wed, Nov 14, 2018 at 3:30 AM Maximilian Michels <mxm@xxxxxxxxxx> wrote:
Hi Ruoyun,

I just ran the wordcount locally using the instructions on the page.
I've tried the local file system and GCS. Both times it ran successfully
and produced valid output.

I'm assuming there is some problem with your setup. Which platform are
you using? I'm on MacOS.

Could you expand on the planned merge? From my understanding we will
always need PortableRunner in Python to be able to submit against the
Beam JobServer.

Thanks,
Max

On 14.11.18 00:39, Ruoyun Huang wrote:
> A quick follow-up on using current PortableRunner.
>
> I followed the exact three steps as Ankur and Maximilian shared in
> https://beam.apache.org/roadmap/portability/#python-on-flink  ;   The
> wordcount example keeps hanging after 10 minutes.  I also tried
> specifying explicit input/output args, either using gcs folder or local
> file system, but none of them works.
>
> Spent some time looking into it but conclusion yet.  At this point
> though, I guess it does not matter much any more, given we already have
> the plan of merging PortableRunner into using java reference runner
> (i.e. :beam-runners-reference-job-server).
>
> Still appreciated if someone can try out the python-on-flink
> <https://beam.apache.org/roadmap/portability/#python-on-flink>instructions
> in case it is just due to my local machine setup.  Thanks!
>
>
>
> On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang <ruoyun@xxxxxxxxxx
> <mailto:ruoyun@xxxxxxxxxx>> wrote:
>
>     Thanks Maximilian!
>
>     I am working on migrating existing PortableRunner to using java ULR
>     (Link to Notes
>     <https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#>).
>     If this issue is non-trivial to solve, I would vote for removing
>     this default behavior as part of the consolidation.
>
>     On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels <mxm@xxxxxxxxxx
>     <mailto:mxm@xxxxxxxxxx>> wrote:
>
>         In the long run, we should get rid of the Docker-inside-Docker
>         approach,
>         which was only intended for testing anyways. It would be cleaner to
>         start the SDK harness container alongside with JobServer container.
>
>         Short term, I think it should be easy to either fix the
>         permissions of
>         the mounted "docker" executable or use a Docker image for the
>         JobServer
>         which comes with Docker pre-installed.
>
>         JIRA: https://issues.apache.org/jira/browse/BEAM-6020
>
>         Thanks for reporting this Ruoyun!
>
>         -Max
>
>         On 08.11.18 00:10, Ruoyun Huang wrote:
>          > Thanks Ankur and Maximilian.
>          >
>          > Just for reference in case other people encountering the same
>         error
>          > message, the "permission denied" error in my original email
>         is exactly
>          > due to dockerinsidedocker issue that Ankur mentioned.     
>         Thanks Ankur!
>          > Didn't make the link when you said it, had to discover that
>         in a hard
>          > way (I thought it is due to my docker installation messed up).
>          >
>          > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels
>         <mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>
>          > <mailto:mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>>> wrote:
>          >
>          >     Hi,
>          >
>          >     Please follow
>          > https://beam.apache.org/roadmap/portability/#python-on-flink
>          >
>          >     Cheers,
>          >     Max
>          >
>          >     On 06.11.18 01:14, Ankur Goenka wrote:
>          >      > Hi,
>          >      >
>          >      > The Portable Runner requires a job server uri to work
>         with. The
>          >     current
>          >      > default job server docker image is broken because of
>         docker inside
>          >      > docker issue.
>          >      >
>          >      > Please refer to
>          >      >
>         https://beam.apache.org/roadmap/portability/#python-on-flink for
>          >     how to
>          >      > run a wordcount using Portable Flink Runner.
>          >      >
>          >      > Thanks,
>          >      > Ankur
>          >      >
>          >      > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang
>         <ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>
>          >     <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>>
>          >      > <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>
>         <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>>>> wrote:
>          >      >
>          >      >     Hi, Folks,
>          >      >
>          >      >           I want to try out Python PortableRunner, by
>         using following
>          >      >     command:
>          >      >
>          >      >     *sdk/python: python -m apache_beam.examples.wordcount
>          >      >       --output=/tmp/test_output   --runner PortableRunner*
>          >      >
>          >      >           It complains with following error message:
>          >      >
>          >      >     Caused by: java.lang.Exception: The user defined
>         'open()' method
>          >      >     caused an exception: java.io.IOException: Cannot
>         run program
>          >      >     "docker": error=13, Permission denied
>          >      >     at
>          >   
>           org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
>          >      >     at
>          >      >
>          >     
>           org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>          >      >     at
>         org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
>          >      >     ... 1 more
>          >      >     Caused by:
>          >      >
>          >     
>           org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
>          >      >     java.io.IOException: Cannot run program "docker":
>         error=13,
>          >      >     Permission denied
>          >      >     at
>          >      >
>          >     
>           org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)
>          >      >
>          >      >     ... 7 more
>          >      >
>          >      >
>          >      >
>          >      >     My py2 environment is properly configured, because
>         DirectRunner
>          >      >     works.  Also I tested my docker installation by
>         'docker run
>          >      >     hello-world ', no issue.
>          >      >
>          >      >
>          >      >     Thanks.
>          >      >     --
>          >      >     ================
>          >      >     Ruoyun  Huang
>          >      >
>          >
>          >
>          >
>          > --
>          > ================
>          > Ruoyun  Huang
>          >
>
>
>
>     --
>     ================
>     Ruoyun  Huang
>
>
>
> --
> ================
> Ruoyun  Huang
>