osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Proposal of external shuffle service


Thanks for the update Zhijiang! The community is currently quite busy with
the next Flink release. I hope that we can finish the release in two weeks.
After that people will become more responsive again.

Cheers,
Till

On Wed, Oct 31, 2018 at 7:49 AM zhijiang <wangzhijiang999@xxxxxxxxxx> wrote:

> I already created the umbrella jira [1] for this improvement, and attched
> the design doc [2] in this jira.
>
> Welcome for further discussion about the details.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10653
> [2]
> https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing
>
>
> <https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing>
> Best,
> Zhijiang
>
> ------------------------------------------------------------------
> 发件人:Zhijiang(wangzhijiang999) <wangzhijiang999@xxxxxxxxxx.INVALID>
> 发送时间:2018年9月11日(星期二) 15:21
> 收件人:dev <dev@xxxxxxxxxxxxxxxx>
> 抄 送:dev <dev@xxxxxxxxxxxxxxxx>
> 主 题:回复:[DISCUSS] Proposal of external shuffle service
>
> Many thanks Till!
>
>
> I would create a JIRA for this feature and design a document attched with it.
> I will let you know after ready! :)
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> 发件人:Till Rohrmann <trohrmann@xxxxxxxxxx>
> 发送时间:2018年9月7日(星期五) 22:01
> 收件人:Zhijiang(wangzhijiang999) <wangzhijiang999@xxxxxxxxxx>
> 抄 送:dev <dev@xxxxxxxxxxxxxxxx>
> 主 题:Re: [DISCUSS] Proposal of external shuffle service
>
> The rough plan sounds good Zhijiang. I think we should continue with what
> you've proposed: Open a JIRA issue and creating a design document which
> outlines the required changes a little bit more in detail. Once this is
> done, we should link the design document in the JIRA issue and post it here
> for further discussion.
>
> Cheers,
> Till
>
> On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) <
> wangzhijiang999@xxxxxxxxxx> wrote:
>
> > Glad to receive your positive feedbacks Till!
> >
> > Actually our motivation is to support batch job well as you mentioned.
> >
> > For output level, flink already has the Subpartition abstraction(writer),
> > and currently there are PipelinedSubpartition(memory output) and
> > SpillableSubpartition(one-sp-one-file output) implementations. We can
> > extend this abstraction to realize other persistent outputs (e.g.
> > sort-merge-file).
> >
> > For transport level(shuffle service), the current SubpartitionView
> > abstraction(reader) seems as the brige linked with the output level, then
>
> > the view can understand and read the different output formats. The current
> > NetworkEnvironment seems take the role of internal shuffle service in
> > TaskManager and the transport server is realized by netty inside. This
>
> > component can also be started in other external containers like NodeManager
> > of yarn to take the role of external shuffle service. Further we can
>
> > abstract to extend the shuffle service for transporting outputs by http or
>
> > rdma instead of current netty.  This abstraction should provide the way for
> > output registration in order to read the results correctly, similar with
> > current SubpartitionView.
> >
> > The above is still a rough idea. Next I plan to create a feature jira to
> > cover the related changes if possible. It would be better if getting help
> > from related committers to review the detail designs together.
> >
> > Best,
> > Zhijiang
> >
> > ------------------------------------------------------------------
> > 发件人:Till Rohrmann <trohrmann@xxxxxxxxxx>
> > 发送时间:2018年8月29日(星期三) 17:36
> > 收件人:dev <dev@xxxxxxxxxxxxxxxx>; Zhijiang(wangzhijiang999) <
> > wangzhijiang999@xxxxxxxxxx>
> > 主 题:Re: [DISCUSS] Proposal of external shuffle service
> >
> > Thanks for starting this design discussion Zhijiang!
> >
> > I really like the idea to introduce a ShuffleService abstraction which
>
> > allows to have different implementations depending on the actual use case.
>
> > Especially for batch jobs I can clearly see the benefits of persisting the
> > results somewhere else.
> >
> > Do you already know which interfaces we need to extend and where to
> > introduce new abstractions?
> >
> > Cheers,
> > Till
> >
> > On Mon, Aug 27, 2018 at 1:57 PM Zhijiang(wangzhijiang999)
> > <wangzhijiang999@xxxxxxxxxx.invalid> wrote:
> > Hi all!
> >
>
> > The shuffle service is responsible for transporting upstream produced data
> > to the downstream side. In flink, the NettyServer is used for network
>
> > transport service and this component is started in the TaskManager process.
> > That means the TaskManager can support internal shuffle service which
> > exists some concerns:
> > 1. If a task finishes, the ResultPartition of this task still retains
> > registered in TaskManager, because the output buffers have to be
> > transported by internal shuffle service in TaskManager. That means the
> > TaskManager can not be released by ResourceManager until ResultPartition
> > released. It may waste container resources and can not support well for
> > dynamic resource scenarios.
> > 2. If we want to expand another shuffle service implementation, the
> > current mechanism is not easy to handle, because the output level (result
> > partition) and transport level (shuffle service) are not divided clearly
> > and loss of abstraction to be extended.
> >
> > For above considerations, we propose the external shuffle service which
> > can be deployed on any other external contaienrs, e.g. NodeManager
>
> > container in yarn. Then the TaskManager can be released ASAP ifneeded when
> > all the internal tasks finished. The persistent output files of these
> > finished tasks can be served to transport by external shuffle service in
> > the same machine.
> >
> > Further we can abstract both of the output level and transport level to
>
> > support different implementations. e.g. We realized merging the data of all
>
> > the subpartitions into limited persistent local files for disk improvements
> > in some scenarios instead of one-subpartiton-one-file.
> >
> > I know it may be a big work for doing this, and I just point out some
> > ideas, and wish getting any feedbacks from you!
> >
> > Best,
> > Zhijiang
> >
> >
> >
>
>
>