osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to use "PortableRunner" in Python SDK?


Hi Ruoyun,

I just ran the wordcount locally using the instructions on the page. I've tried the local file system and GCS. Both times it ran successfully and produced valid output.

I'm assuming there is some problem with your setup. Which platform are you using? I'm on MacOS.

Could you expand on the planned merge? From my understanding we will always need PortableRunner in Python to be able to submit against the Beam JobServer.

Thanks,
Max

On 14.11.18 00:39, Ruoyun Huang wrote:
A quick follow-up on using current PortableRunner.

I followed the exact three steps as Ankur and Maximilian shared in https://beam.apache.org/roadmap/portability/#python-on-flink ; ;   The wordcount example keeps hanging after 10 minutes.  I also tried specifying explicit input/output args, either using gcs folder or local file system, but none of them works.

Spent some time looking into it but conclusion yet.  At this point though, I guess it does not matter much any more, given we already have the plan of merging PortableRunner into using java reference runner (i.e. :beam-runners-reference-job-server).

Still appreciated if someone can try out the python-on-flink <https://beam.apache.org/roadmap/portability/#python-on-flink>instructions in case it is just due to my local machine setup.  Thanks!



On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang <ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>> wrote:

    Thanks Maximilian!

    I am working on migrating existing PortableRunner to using java ULR
    (Link to Notes
    <https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#>).
    If this issue is non-trivial to solve, I would vote for removing
    this default behavior as part of the consolidation.

    On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels <mxm@xxxxxxxxxx
    <mailto:mxm@xxxxxxxxxx>> wrote:

        In the long run, we should get rid of the Docker-inside-Docker
        approach,
        which was only intended for testing anyways. It would be cleaner to
        start the SDK harness container alongside with JobServer container.

        Short term, I think it should be easy to either fix the
        permissions of
        the mounted "docker" executable or use a Docker image for the
        JobServer
        which comes with Docker pre-installed.

        JIRA: https://issues.apache.org/jira/browse/BEAM-6020

        Thanks for reporting this Ruoyun!

        -Max

        On 08.11.18 00:10, Ruoyun Huang wrote:
         > Thanks Ankur and Maximilian.
         >
         > Just for reference in case other people encountering the same
        error
         > message, the "permission denied" error in my original email
        is exactly
> due to dockerinsidedocker issue that Ankur mentioned. Thanks Ankur!
         > Didn't make the link when you said it, had to discover that
        in a hard
         > way (I thought it is due to my docker installation messed up).
         >
         > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels
        <mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>
         > <mailto:mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>>> wrote:
         >
         >     Hi,
         >
         >     Please follow
         > https://beam.apache.org/roadmap/portability/#python-on-flink
         >
         >     Cheers,
         >     Max
         >
         >     On 06.11.18 01:14, Ankur Goenka wrote:
         >      > Hi,
         >      >
         >      > The Portable Runner requires a job server uri to work
        with. The
         >     current
         >      > default job server docker image is broken because of
        docker inside
         >      > docker issue.
         >      >
         >      > Please refer to
         >      >
        https://beam.apache.org/roadmap/portability/#python-on-flink for
         >     how to
         >      > run a wordcount using Portable Flink Runner.
         >      >
         >      > Thanks,
         >      > Ankur
         >      >
         >      > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang
        <ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>
         >     <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>>
         >      > <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>
        <mailto:ruoyun@xxxxxxxxxx <mailto:ruoyun@xxxxxxxxxx>>>> wrote:
         >      >
         >      >     Hi, Folks,
         >      >
         >      >           I want to try out Python PortableRunner, by
        using following
         >      >     command:
         >      >
         >      >     *sdk/python: python -m apache_beam.examples.wordcount
         >      >       --output=/tmp/test_output   --runner PortableRunner*
         >      >
         >      >           It complains with following error message:
         >      >
         >      >     Caused by: java.lang.Exception: The user defined
        'open()' method
         >      >     caused an exception: java.io.IOException: Cannot
        run program
         >      >     "docker": error=13, Permission denied
         >      >     at
>  org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
         >      >     at
         >      >
>  org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
         >      >     at
        org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
         >      >     ... 1 more
         >      >     Caused by:
         >      >
>  org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
         >      >     java.io.IOException: Cannot run program "docker":
        error=13,
         >      >     Permission denied
         >      >     at
         >      >
>  org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)
         >      >
         >      >     ... 7 more
         >      >
         >      >
         >      >
         >      >     My py2 environment is properly configured, because
        DirectRunner
         >      >     works.  Also I tested my docker installation by
        'docker run
         >      >     hello-world ', no issue.
         >      >
         >      >
         >      >     Thanks.
         >      >     --
         >      >     ================
         >      >     Ruoyun  Huang
         >      >
         >
         >
         >
         > --
         > ================
         > Ruoyun  Huang
         >



-- ================
    Ruoyun  Huang



--
================
Ruoyun  Huang