Kubernetes is otherwise more of a runner deployment concern. There are
efforts in the Flink community underway to make deployment on Kubernetes
Max: thanks for taking notes!
On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <herohde@xxxxxxxxxx
Regarding the Kubernetes/Docker story: the current idea for that
setup is to use a per-job pod for the user/sdk containers + runner
container, so that running (and scaling) a job will go with the
grain of that ecosystem. The Beam code on each worker thus wouldn't
do any container management. This is also how Dataflow essentially
works. The process-based option assumes that the runner environment
is what the SDK needs, which is generally not the case.
On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <alex@xxxxxxxxxxx
Hey Max, I've build quit some experience with *Kubernetes* over
the years. The problem you describe seems like a custom operator
story. The thing is I don't know enough of the runner and
bootstrapping story. After the summit I'm quite eager to dive
into a beam problem, so if you like to collaborate on that topic
let me know.
_/ Alex Van Boxel
On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels
<mxm@xxxxxxxxxx <mailto:mxm@xxxxxxxxxx>> wrote:
What do you think about collecting some of the feedback from
community at Beam Summit last week? Here's what I've come
* The Kubernetes / Docker Story
Multiple users reported that they would like a
What is the best way to deploy Beam with Kubernetes? Will
Especially with regards to the portability, there are some
problems, e.g. how to start Beam containerized and bootstrap
Harness container from within a container? For local testing
JobServer we support that via mounting the Docker socket,
but this will
be too fragile in production scenarios. Now that we have
execution, we could just use that inside the main container.
Deployment is a very important topic for users and we should
reduce complexity as much as possible.
* External SDKs / Scio
Users have asked why Scio is not part of the main
I don't think that has to be the case, same for the Runners
not part of the main repo. However, it does raise the
will be the future model for maintaining SDKs/IOs/Runners?
How do we
ensure easy development and a consistent quality of
* Documenting Timers & State
These two have excellent blog posts but are not part of the
documentation. Since they are part of the model, it would be
eventually update the docs.
* Better Debuggability of pipelines
Even a simple WordCount in Beam leads to a quite complex
graph (due to the the involved I/O logic). How can we make
easier to understand? Will we provide a way to visualize the
architecture of high-level Beam pipelines? If so, do we
provide a way to
gain insight into how it is mapped to the Runner execution
would like to have more insight.
* Current Roadmap
This was asked in the context of portability. By the end of
the year we
should have at least the FlinkRunner in a ready state, with
following up. There are a lot of others threads in Beam. The
is a great way to keep up with the project development.
Looking forward to any other points you might have.