CWL support in Airflow
Hi Airflow community,
I’m Michael from Cincinnaty Children’s Hospital Medical Center, Barski-Lab.
Our lab is working on bioinfromatic workflows written in CWL. We run them on
our local server using Airflow.
Every CWL pipeline is a YAML/JSON file that describes inputs, outputs and the
relationship between the steps. Every step runs a specific tool within Docker.
Workflow inputs are provided through the separate YAML/JSON job file.
More details about CWL standard can be found here https://www.commonwl.org/ <https://www.commonwl.org/>
Unfortunately, Airflow cannot parse CWL files directly. To address this problem
we created Python package that can do it for Airflow. Actually, we developed
two packages with similar functionality.
1. CWL Airflow - https://github.com/Barski-lab/cwl-airflow <https://github.com/Barski-lab/cwl-airflow>2. CWL Airflow Parser - https://github.com/datirium/cwl-airflow-parser <https://github.com/datirium/cwl-airflow-parser>
Both of them can parse CWL file and create DAG based on the pipeline’s structure.
The major difference between this two packages is that the first one creates
new DAG for every combination of CWL workflow & job file, whereas the second
one creates DAG for every unique workflow and uses job file only to trigger
DagRun with the specific parameters. Every approach has its pros and cons.
Both programs work good on our server, and we are looking forward to integrate this
functionality into Airflow officially. For this reason we need an advice about the best
way to do this.
It would be nice to make Airflow recognize and parse *.cwl files directly from the dag
folder. Perhaps, I can somehow use Airflow Plugins for this purpose?
I would be happy to find people who share the same ideas of making Airflow compatible
with CWL standards