osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to use "PortableRunner" in Python SDK?


To answer Maximilian's question. 

I am using Linux, debian distribution. 

It probably sounded too much when I used the word 'planned merge'. What I really meant entails less change than it sounds. More specifically: 

1) The default behavior, where PortableRunner starts a flink server.  It is confusing to new users.
2) All the related docs and inline comments.  Similarly, it could be very confusing connecting PortableRunner to Flink server. 
3) [Probably no longer an issue].   I couldn't make the flink server example working.  And I could not make example working on Java-ULR either.  Both will require debugging for resolutions.  Thus I figured maybe let us only focus on one single thing: the java-ULR part, without worrying about Flink-server.   Again, looks like this may not be a valid concern, given flink part is most likely due to my setup. 


On Wed, Nov 14, 2018 at 3:30 AM Maximilian Michels <mxm@xxxxxxxxxx> wrote:
Hi Ruoyun,

I just ran the wordcount locally using the instructions on the page.
I've tried the local file system and GCS. Both times it ran successfully
and produced valid output.

I'm assuming there is some problem with your setup. Which platform are
you using? I'm on MacOS.

Could you expand on the planned merge? From my understanding we will
always need PortableRunner in Python to be able to submit against the
Beam JobServer.

Thanks,
Max

On 14.11.18 00:39, Ruoyun Huang wrote:
> A quick follow-up on using current PortableRunner.
>
> I followed the exact three steps as Ankur and Maximilian shared in
> https://beam.apache.org/roadmap/portability/#python-on-flink  ;   The
> wordcount example keeps hanging after 10 minutes.  I also tried
> specifying explicit input/output args, either using gcs folder or local
> file system, but none of them works.
>
> Spent some time looking into it but conclusion yet.  At this point
> though, I guess it does not matter much any more, given we already have
> the plan of merging PortableRunner into using java reference runner
> (i.e. :beam-runners-reference-job-server).
>
> Still appreciated if someone can try out the python-on-flink
> <https://beam.apache.org/roadmap/portability/#python-on-flink>instructions
> in case it is just due to my local machine setup.  Thanks!
>
>
>
> On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang <ruoyun@xxxxxxxxxx
> <mailto:ruoyun@xxxxxxxxxx>> wrote:
>
>     Thanks Maximilian!
>
>     I am working on migrating existing PortableRunner to using java ULR
>     (Link to Notes
>     <https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#>).
>     If this issue is non-trivial to solve, I would vote for removing
>     this default behavior as part of the consolidation.
>
>     On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels <mxm@xxxxxxxxxx
>     <mailto:mxm@xxxxxxxxxx>> wrote:
>
>         In the long run, we should get rid of the Docker-inside-Docker
>         approach,
>         which was only intended for testing anyways. It would be cleaner to
>         start the SDK harness container alongside with JobServer container.
>
>         Short term, I think it should be easy to either fix the
>         permissions of
>         the mounted "docker" executable or use a Docker image for the
>         JobServer
>         which comes with Docker pre-installed.
>
>         JIRA: https://issues.apache.org/jira/browse/BEAM-6020
>
>         Thanks for reporting this Ruoyun!
>
>         -Max
>
>         On 08.11.18 00:10, Ruoyun Huang wrote:
>          > Thanks Ankur and Maximilian.
>          >
>          > Just for reference in case other people encountering the same
>         error
>          > message, the "permission denied" error in my original email
>         is exactly
>          > due to dockerinsidedocker issue that Ankur mentioned.     
>         Thanks Ankur!
>          > Didn't make the link when you said it, had to discover that
>         in a hard
>          > way (I thought it is due to my docker installation messed up).
>          >
>          > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels
>         <mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>
>          > <mailto:mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>>> wrote:
>          >
>          >     Hi,
>          >
>          >     Please follow
>          > https://beam.apache.org/roadmap/portability/#python-on-flink
>          >
>          >     Cheers,
>          >     Max
>          >
>          >     On 06.11.18 01:14, Ankur Goenka wrote:
>          >      > Hi,
>          >      >
>          >      > The Portable Runner requires a job server uri to work
>         with. The
>          >     current
>          >      > default job server docker image is broken because of
>         docker inside
>          >      > docker issue.
>          >      >
>          >      > Please refer to
>          >      >
>         https://beam.apache.org/roadmap/portability/#python-on-flink for
>          >     how to
>          >      > run a wordcount using Portable Flink Runner.
>          >      >
>          >      > Thanks,
>          >      > Ankur
>          >      >
>          >      > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang
>         <ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>
>          >     <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>>
>          >      > <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>
>         <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>>>> wrote:
>          >      >
>          >      >     Hi, Folks,
>          >      >
>          >      >           I want to try out Python PortableRunner, by
>         using following
>          >      >     command:
>          >      >
>          >      >     *sdk/python: python -m apache_beam.examples.wordcount
>          >      >       --output=/tmp/test_output   --runner PortableRunner*
>          >      >
>          >      >           It complains with following error message:
>          >      >
>          >      >     Caused by: java.lang.Exception: The user defined
>         'open()' method
>          >      >     caused an exception: java.io.IOException: Cannot
>         run program
>          >      >     "docker": error=13, Permission denied
>          >      >     at
>          >   
>           org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
>          >      >     at
>          >      >
>          >     
>           org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>          >      >     at
>         org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
>          >      >     ... 1 more
>          >      >     Caused by:
>          >      >
>          >     
>           org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
>          >      >     java.io.IOException: Cannot run program "docker":
>         error=13,
>          >      >     Permission denied
>          >      >     at
>          >      >
>          >     
>           org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)
>          >      >
>          >      >     ... 7 more
>          >      >
>          >      >
>          >      >
>          >      >     My py2 environment is properly configured, because
>         DirectRunner
>          >      >     works.  Also I tested my docker installation by
>         'docker run
>          >      >     hello-world ', no issue.
>          >      >
>          >      >
>          >      >     Thanks.
>          >      >     --
>          >      >     ================
>          >      >     Ruoyun  Huang
>          >      >
>          >
>          >
>          >
>          > --
>          > ================
>          > Ruoyun  Huang
>          >
>
>
>
>     --
>     ================
>     Ruoyun  Huang
>
>
>
> --
> ================
> Ruoyun  Huang
>


--
================
Ruoyun  Huang