osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Airflow - YARN as an executor?


Hey I didn’t know this Bolke, I was under the impression of the same as Ruslan.
Thanks for the share

Sent from my iPhone

> On Apr 24, 2018, at 2:12 PM, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
> 
> It actually can nowadays: https://cdn.oreillystatic.com/en/assets/1/event/269/HDFS%20on%20Kubernetes_%20Tech%20deep%20dive%20on%20locality%20and%20security%20Presentation.pptx
> 
> We also have an on premise setup with ceph (s3a) and HDFS for when we need the speed and kubernetes for our workloads. We are kicking out Yarn (and hive etc for that matter).
> 
> Bolke
> 
> 
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 24 apr. 2018 om 22:50 heeft Ruslan Dautkhanov <dautkhanov@xxxxxxxxx> het volgende geschreven:
>> 
>> Kubernetes is a "monolithic" 1-level scheduler that can't handle what YARN
>> can - for example schedule tasks local to data.
>> Hadoop has multiple levels of data locality (node-local, rack-local) - so
>> computation happens local to data to minimize network
>> data transfer which is expensive.
>> K8s wasn't designed to handle this scheduling scenarios, as far as I know.
>> 
>> For cloud deployments where we don't have data locality problem (because of
>> s3 is being used instead of storage local
>> to servers), k8s might be okay.
>> 
>> Nice comparison [1] of k8s vs two-level schedulers like yarn and messos ..
>> although I think it's an offtopic.
>> 
>> We're mostly on-prem and we don't see kubernetes take over yarn any time
>> soon.
>> 
>> Thanks.
>> 
>> 
>> 
>> [1]
>> 
>> https://aaltodoc.aalto.fi/bitstream/handle/123456789/27061/master_Ravula_Shashi_2017.pdf?sequence=1
>> 
>> *2.3.2 Monolithic Schedulers *
>> 
>> 
>> 
>> Monolithic schedulers use a single, centralized scheduling algorithm for
>> all jobs. All workload is run through the same scheduler and same
>> scheduling logic. Swarm,
>> Fleet, Borg and Kubernetes adopt monolithic schedulers. Kubernetes
>> improvised on basic monolithic version of Borg and Swarm schedulers. This
>> type of schedulers are not suitable for running heterogeneous modern
>> workloads which include Spark jobs, containers, and other long running jobs,
>> etc.
>> 
>> 
>> 
>> *2.3.3 Two Level Schedulers *
>> 
>> 
>> 
>> Two-level schedulers address the drawbacks of a monolithic scheduler by
>> separating concerns of resource allocation and task placement. An active
>> resource manager offers compute resources to multiple parallel, independent
>> “scheduler frameworks”. The Mesos cluster manager pioneered this approach,
>> and YARN supports a limited version of it. In Mesos, resources are offered
>> to application-level schedulers. This allows for custom, workload-specific
>> scheduling policies. The drawback with this type of scheduling architecture
>> is that the application level frameworks cannot see all the possible
>> placement options anymore. Instead, they only see those options that
>> correspond to resources offered (Mesos) or allocated (YARN) by the resource
>> manager component. This makes priority preemption (higher priority tasks
>> kick out lower priority ones) difficult.
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Ruslan Dautkhanov
>> 
>>> On Tue, Apr 24, 2018 at 2:22 PM, Bolke de Bruin <bdbruin@xxxxxxxxx> wrote:
>>> 
>>> Happy to have it as a contrib executor. However, I personally think yarn
>>> is a dead end. It has a lot of catching up to do and all the momentum is
>>> with kubernetes.
>>> 
>>> B.
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 24 apr. 2018 om 22:13 heeft Ruslan Dautkhanov <dautkhanov@xxxxxxxxx>
>>> het volgende geschreven:
>>>> 
>>>> With Hadoop 3's Docker on YARN support, I think YARN becomes
>>>> somewhat a competitor for Kubernetes.
>>>> 
>>>> Great job on adding k8s support to Airflow.
>>>> 
>>>> Very similarly I see Airflow could integrate with YARN and use
>>>> its infrastructure as an "executor" .. have anyone explored feasibility
>>> of
>>>> this approach?
>>>> 
>>>> 
>>>> Thanks!
>>>> Ruslan Dautkhanov
>>>