osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WebHdfsSensor doesn't support HDFS HA


Hi Manu,

Thanks for raising this question. There is a PR for moving
<https://github.com/apache/incubator-airflow/pull/3560> to hdfs3. There is
code in the existing codebase, which support HA
<https://github.com/apache/incubator-airflow/blob/53b89b98371c7bb993b242c341d3941e9ce09f9a/airflow/hooks/hdfs_hook.py#L92-L96>,
but this might not be for the sensor.

Personally I'm not familiar with pyarrow.hdfs, so I'm not the one to judge
how mature it is. We need to replace Snakebite for sure since it is only
compatible with Python 2.7.

Cheers, Fokko


Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang <owenzhang1990@xxxxxxxxx>:

> Hi all,
>
> We've been using WebHdfsSensor happily to sensor the state of upstream
> tasks outputting to HDFS except when there is a namenode switch. I've
> opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to discuss the
> HDFS HA support.
>
> There are two solutions that I can see,
>
> 1. use pyarrow.hdfs which has HA support
> 2. allow user to configure a list of namenodes
>
> WDYT ?
>
> Thanks,
> Manu Zhang
>