OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] FLIP-6 Problems


Hi, Till:


   1. Does the community has any plan to add task manager isolation into
   the session mode?
   2. Is there any issues to track this feature? I want to help contribute.
   3. Thanks for the knowledge but it can't help if task manager isolation
   is not present.


On Tue, Jun 5, 2018 at 7:28 PM Till Rohrmann <trohrmann@xxxxxxxxxx> wrote:

> Hi Renjie,
>
> 1) you're right that the Flink session mode does not give you proper job
> isolation. It is the same as with Flink 1.4 session mode. If this is a
> strong requirement for you, then I recommend using the per job mode.
>
> 2) At the moment it is also not possible to define per job resource
> requirements when using the session mode. This is a feature which the
> community has started implementing but it is not yet fully done. I assume
> that the community will continue working on it. At the moment, the solution
> would be to use the per job mode to not waste unnecessary resources.
>
> 3) I think the assigned ResourceID for a TaskManager is shown in the web UI
> and when querying the "/taskmanagers" REST endpoint. The resource id is
> derived from the Mesos task id. Would that help to identify which TM is
> running on which Mesos task?
>
> Cheers,
> Till
>
> On Tue, Jun 5, 2018 at 5:13 AM Renjie Liu <liurenjie2008@xxxxxxxxx> wrote:
>
> > ---------- Forwarded message ---------
> > From: Renjie Liu <liurenjie2008@xxxxxxxxx>
> > Date: Tue, Jun 5, 2018 at 10:43 AM
> > Subject: [DISCUSS] FLIP-6 Problems
> > To: user <user@xxxxxxxxxxxxxxxx>
> >
> >
> > Hi:
> >
> > We've deployed flink 1.5.0 and tested the new cluster manager, it's
> really
> > great for flink to be elastic. However we've also found some problems
> that
> > blocks us from deploying it to production environment.
> >
> > 1. Task manager isolation. Currently flink allows different jobs to
> execute
> > on same task managers, this is unacceptable in production environment
> since
> > a faulty written job may kill task managers and affect other jobs.
> > 2. Per job resource configuration. Currently flink session cluster can
> only
> > allocate same size and configuration task managers. This may waste a lot
> of
> > resources if we have a lot of jobs with different resource requirement.
> > 3. Task manager's name is meanless.  This is a problem since we can't
> > monitor status of container in mesos environment.
> >
> > One solution to the above problems is to use per job cluster, but a
> > centralized cluster manager can help to manage flink deployment and jobs
> > better.
> >
> > How you guys think about those? If the community agrees with us, we would
> > like to propose design and implementation to enhance the flink cluster
> > manager.
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
>
-- 
Liu, Renjie
Software Engineer, MVAD