OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interesting things about how to know it's a DAG file


What about a manifest file that names all the DAGs? Or a naming convention
for the DAG files themselves?

Alternatively, there could be a single entry point (ie, index.py) from
which all the DAGs are instantiated. There's probably some complexity in
making that work with the multi-process scheduler model, but doesn't seem
insurmountable.

On Thu, May 10, 2018 at 10:31 AM, Arthur Wiedmer <arthur.wiedmer@xxxxxxxxx>
wrote:

> Hi Song,
>
> I agree that this is not ideal, but it is difficult to do otherwise without
> parsing/executing the Python code.
>
> Note that an import from airflow should be enough, or DAG in a comment. I
> think we are open to other solutions, if anyone on the list has better
> ideas.
>
>
> Best,
> Arthur
>
>
>
> On Thu, May 10, 2018 at 12:59 AM Song Liu <songliu@xxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > I just create a custom Dag class naming such as "MyPipeline" by extending
> > the "DAG" class, but Airflow is failed to identify this is a DAG file.
> >
> > After digging into the Airflow implementation around the
> dag_processing.py
> > file:
> >
> > ```
> > # Heuristic that guesses whether a Python file contains an # Airflow DAG
> > definition. might_contain_dag = True if safe_mode and not
> > zipfile.is_zipfile(file_path): with open(file_path, 'rb') as f: content =
> > f.read() might_contain_dag = all( [s in content for s in (b'DAG',
> > b'airflow')])
> > ```
> >
> > So if the keyword "DAG" and "airflow" contained, it is a DAG file.
> >
> > I don't know is there any other be more scientific way for this ?
> >
> > Thanks,
> > Song
> >
>