[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Beam Summit community feedback

Hi Alex,

Would be great to have someone experienced with Kubernetes.

Not sure if it would require a custom Kubernetes Operator. It would probably suffice to have a dedicated Kubernetes mode which starts the Beam environment including Runner and dependencies. From there on, we wouldn't have to start additional containers.

The current portable approach requires us to spawn containers for the SDK Harness at runtime which wouldn't work on k8s, if I'm not mistaken.


On 07.10.18 22:40, Alex Van Boxel wrote:
Hey Max, I've build quit some experience with *Kubernetes* over the years. The problem you describe seems like a custom operator story. The thing is I don't know enough of the runner and bootstrapping story. After the summit I'm quite eager to dive into a beam problem, so if you like to collaborate on that topic let me know.

_/ Alex Van Boxel

On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>> wrote:


    What do you think about collecting some of the feedback from the
    community at Beam Summit last week? Here's what I've come across:

    * The Kubernetes / Docker Story

    Multiple users reported that they would like a Beam-Kubernetes story.
    What is the best way to deploy Beam with Kubernetes? Will there be
    built-in support?

    Especially with regards to the portability, there are some unsolved
    problems, e.g. how to start Beam containerized and bootstrap the SDK
    Harness container from within a container? For local testing with the
    JobServer we support that via mounting the Docker socket, but this will
    be too fragile in production scenarios. Now that we have process-based
    execution, we could just use that inside the main container.

    Deployment is a very important topic for users and we should try to
    reduce complexity as much as possible.

    * External SDKs / Scio

    Users have asked why Scio is not part of the main repository.
    I don't think that has to be the case, same for the Runners which are
    not part of the main repo. However, it does raise the question, what
    will be the future model for maintaining SDKs/IOs/Runners? How do we
    ensure easy development and a consistent quality of internal/external

    * Documenting Timers & State

    These two have excellent blog posts but are not part of the official
    documentation. Since they are part of the model, it would be good to
    eventually update the docs.

    * Better Debuggability of pipelines

    Even a simple WordCount in Beam leads to a quite complex Flink
    graph (due to the the involved I/O logic). How can we make pipelines
    easier to understand? Will we provide a way to visualize the
    architecture of high-level Beam pipelines? If so, do we provide a
    way to
    gain insight into how it is mapped to the Runner execution model? Users
    would like to have more insight.

    * Current Roadmap

    This was asked in the context of portability. By the end of the year we
    should have at least the FlinkRunner in a ready state, with the rest
    following up. There are a lot of others threads in Beam. The newsletter
    is a great way to keep up with the project development.

    Looking forward to any other points you might have.