osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WebHdfsSensor doesn't support HDFS HA


Hi Manu,

We have the same use case as you, a primary and backup namenode. If I
understand your issue correctly, the WebHDFSSensor code checks an iterable
of Airflow connections to the namenode to find one that is active.

However, my issue (which I've emailed this list about) was that you cannot
set multiple connections with the same name (e.g. webhdfs_default) through
the CLI, only in the Web interface. I'm planning on submitting a PR soon to
remedy this.

Ben

On Wed, Aug 29, 2018 at 2:57 AM Driesprong, Fokko <fokko@xxxxxxxxxxxxxx>
wrote:

> Hi Manu,
>
> Thanks for raising this question. There is a PR for moving
> <https://github.com/apache/incubator-airflow/pull/3560> to hdfs3. There is
> code in the existing codebase, which support HA
> <
> https://github.com/apache/incubator-airflow/blob/53b89b98371c7bb993b242c341d3941e9ce09f9a/airflow/hooks/hdfs_hook.py#L92-L96
> >,
> but this might not be for the sensor.
>
> Personally I'm not familiar with pyarrow.hdfs, so I'm not the one to judge
> how mature it is. We need to replace Snakebite for sure since it is only
> compatible with Python 2.7.
>
> Cheers, Fokko
>
>
> Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang <owenzhang1990@xxxxxxxxx>:
>
> > Hi all,
> >
> > We've been using WebHdfsSensor happily to sensor the state of upstream
> > tasks outputting to HDFS except when there is a namenode switch. I've
> > opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to discuss the
> > HDFS HA support.
> >
> > There are two solutions that I can see,
> >
> > 1. use pyarrow.hdfs which has HA support
> > 2. allow user to configure a list of namenodes
> >
> > WDYT ?
> >
> > Thanks,
> > Manu Zhang
> >
>