OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mocking airflow (similar to moto for AWS)


There was a discussion about a unit testing approach last year 2017 I
believe. If you dig the mail archives, you can find it.

My take is:

- You should test "hooks" against some real system, which can be a docker
container. Make sure the behavior is predictable when talking against that
system. Hook tests are not part of general CI tests because of the
complexity of the CI setup you'd have to make, so they are run on local
boxes.
- Maybe add additional "mock" hook tests, mocking out the connected systems.
- When hooks are tested, operators can use 'mocked' hooks that no longer
need access to actual systems. You can then set up an environment where you
have predictable inputs and outputs and test how the operators act on them.
I've used "behave" to do that with very simple record sets, but you can
make these as complex as you want.
- Then you know your hooks and operators work functionally. Testing if your
workflow works in general can be implemented by adding "check" operators.
The benefit here is that you don't test the workflow once, but you test for
data consistency every time the dag runs. If you have complex workflows
where the correct behavior of the flow is worrysome, then you may need to
go deeper into it.

The above doesn't depend on DAGS that need to be scheduled and the delays
involving that.

All of the above is implemented in my repo
https://github.com/gtoonstra/airflow-hovercraft  , using "behave" as a BDD
method of testing, so you can peruse that.

Rgds,

G>


On Thu, Oct 18, 2018 at 2:43 PM Jarek Potiuk <Jarek.Potiuk@xxxxxxxxxxx>
wrote:

> I am also looking to have (I think) similar workflow. Maybe someone has
> done something similar and can give some hints on how to do it the easiest
> way?
>
> Context:
>
> While developing operators I am using example test DAGs that talk to GCP.
> So far our "integration tests" require copying the dag folder and
> restarting the airflow servers, unpausing the dag and waiting for it to
> start. That takes a lot of time, sometimes just to find out that you missed
> one import.
>
> Ideal workflow:
>
> Ideally I'd love to have a "unit" test (i.e possible to run via nosetests
> or IDE integration/PyCharm) that:
>
>    - should not need to have airflow scheduler/webserver started. I guess
>    we need a DB but possibly an in-memory, on-demand created database
> might be
>    a good solution
>    - load the DAG from a file specified (not really from/dags directory)
>    - build internal dependencies between the DAG tasks (as specified in the
>    Dag)
>    - run the DAG immediately and fully (i.e. run all the "execute" methods
>    as needed and pass XCOM between tasks).
>    - ideally produce log output in console rather in per-task files.
>
> I thought about using DagRun/DagBag but have not tried it yet and not sure
> if you need to have whole environment set (which parts?). Any help
> appreciated :) ?
>
> J.
>
> On Thu, Oct 18, 2018 at 1:08 AM bielllobera@xxxxxxxxx <
> bielllobera@xxxxxxxxx>
> wrote:
>
> > I think it would be great to have a way to mock airflow for unit tests.
> > The way I approached this was to create a context manager that creates a
> > temporary directory, sets the AIRFLOW_HOME environment variable to this
> > directory (only within the scope of the context manager) and then renders
> > an airflow.cfg to that location. This creates an SQLite just for the test
> > so you can add variables and connections needed for the test without
> > affecting the real Airflow installation.
> >
> > The first thing I realized is that this didn't work if the imports were
> > outside the context manager, since airflow.configuration and
> > airflow.settings perform all the initialization when they are imported,
> so
> > the AIRFLOW_HOME variable is already set to the real installation before
> > getting inside the context manager.
> >
> > The workaround for this was to reload those modules and this works for
> the
> > tests I have written. However, when I tried to use it for something more
> > complex (I have a plugin that I'm importing) I noticed that inside the
> > operator in this plugin, AIRFLOW_HOME is still set to the real
> > installation, not the temporary one for the test. I thought this must be
> > related to the imports but I haven't been able to figure out a way to fix
> > the issue. I tried patching some methods but I must have been missing
> > something because the database initialization failed.
> >
> > Does anyone have an idea on the best way to mock/patch airflow so that
> > EVERYTHING that is executed inside the context manager uses the temporary
> > installation?
> >
> > PS: This is my current attempt which works for the tests I defined but
> not
> > for external plugins:
> > https://github.com/biellls/airflow_testing
> >
> > For an example on how it works:
> >
> https://github.com/biellls/airflow_testing/blob/master/tests/mock_airflow_test.py
> >
>
>
> --
>
> *Jarek Potiuk, Principal Software Engineer*
> Mobile: +48 660 796 129
>