[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Beam Summit community feedback

Thanks for the pointer to the thread. I didn't know there already had been a discussion. It is possible to look at Kubernetes support solely from a Runner perspective, still we have to provide the basic knobs in Beam to make deployment easy.

The approach Henning described here and in the thread (Approach 2: https://lists.apache.org/thread.html/209ddf4d701c8c915e3b411e99773f491a6cd830807d636b470000e8@%3Cdev.beam.apache.org%3E) where the backend and the SDK harness are started concurrently with fixed endpoints would be the way to go. In the Proto we already have the "EXTERNAL" environment for that.

On 08.10.18 20:18, Thomas Weise wrote:
Related thread:


Kubernetes is otherwise more of a runner deployment concern. There are efforts in the Flink community underway to make deployment on Kubernetes easier.

Max: thanks for taking notes!

On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <herohde@xxxxxxxxxx <mailto:herohde@xxxxxxxxxx>> wrote:

    Regarding the Kubernetes/Docker story: the current idea for that
    setup is to use a per-job pod for the user/sdk containers + runner
    container, so that running (and scaling) a job will go with the
    grain of that ecosystem. The Beam code on each worker thus wouldn't
    do any container management. This is also how Dataflow essentially
    works. The process-based option assumes that the runner environment
    is what the SDK needs, which is generally not the case.


    On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <alex@xxxxxxxxxxx
    <mailto:alex@xxxxxxxxxxx>> wrote:

        Hey Max, I've build quit some experience with *Kubernetes* over
        the years. The problem you describe seems like a custom operator
        story. The thing is I don't know enough of the runner and
        bootstrapping story. After the summit I'm quite eager to dive
        into a beam problem, so if you like to collaborate on that topic
        let me know.

        _/ Alex Van Boxel

        On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels
        <mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>> wrote:


            What do you think about collecting some of the feedback from
            community at Beam Summit last week? Here's what I've come

            * The Kubernetes / Docker Story

            Multiple users reported that they would like a
            Beam-Kubernetes story.
            What is the best way to deploy Beam with Kubernetes? Will
            there be
            built-in support?

            Especially with regards to the portability, there are some
            problems, e.g. how to start Beam containerized and bootstrap
            the SDK
            Harness container from within a container? For local testing
            with the
            JobServer we support that via mounting the Docker socket,
            but this will
            be too fragile in production scenarios. Now that we have
            execution, we could just use that inside the main container.

            Deployment is a very important topic for users and we should
            try to
            reduce complexity as much as possible.

            * External SDKs / Scio

            Users have asked why Scio is not part of the main
            repository. Generally,
            I don't think that has to be the case, same for the Runners
            which are
            not part of the main repo. However, it does raise the
            question, what
            will be the future model for maintaining SDKs/IOs/Runners?
            How do we
            ensure easy development and a consistent quality of

            * Documenting Timers & State

            These two have excellent blog posts but are not part of the
            documentation. Since they are part of the model, it would be
            good to
            eventually update the docs.

            * Better Debuggability of pipelines

            Even a simple WordCount in Beam leads to a quite complex
            Flink execution
            graph (due to the the involved I/O logic). How can we make
            easier to understand? Will we provide a way to visualize the
            architecture of high-level Beam pipelines? If so, do we
            provide a way to
            gain insight into how it is mapped to the Runner execution
            model? Users
            would like to have more insight.

            * Current Roadmap

            This was asked in the context of portability. By the end of
            the year we
            should have at least the FlinkRunner in a ready state, with
            the rest
            following up. There are a lot of others threads in Beam. The
            is a great way to keep up with the project development.

            Looking forward to any other points you might have.