osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

BashOperator and Recycled PIDs


Hi all,

We've fans of Airflow at Flexport and have adopted it for our (pretty
basic) use case. However, we're seeing intermittent exit issues with
BashOperators; specifically, we're seeing SIGTERMS being sent to our
operator and can't track where they're coming from. We think it's due
to *recycled
PIDs*, per the note here:
https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/utils/helpers.py#L212

Is this likely, and has anyone encountered this before? If this is
something you're familiar with, I've included a writeup below:

*Description*
We orchestrate a workflow of bash scripts via BashOperators. Every so
often, we observe the operator exiting with signal -15. *Subsequent re-runs
always succeed.* Since our bash script does not explicitly exit with -15
anywhere, we're not sure what the cause could be.

Our BashOperators run a python script which launches a subprocess call to
psql.

Consistent symptoms are:
* Task process receives SIGTERM from unknown source
* the next time the Task runs, it succeeds

Our environment:
* Airflow v1.9.0
* running with CeleryExecutor, 64 concurrent workers on a single EC2
instance
* scheduler & workers run on same instance
* Ubuntu 16.04

*Logs*
Here's the relevant logs. Observe the TaskRunner seems to be receiving a
SIGTERM from an unknown source and getting killed off.

```

[2018-06-12 20:15:51,990] {models.py:1428} INFO - Executing
<Task(BashOperator): 15min_bi_products> on 2018-06-12 20:00:00
[2018-06-12 20:15:51,990] {base_task_runner.py:115} INFO - Running:
['bash', '-c', u'airflow run 15min_build_bi_v1 15min_bi_products
2018-06-12T20:00:00 --job_id 254272 --raw -sd
DAGS_FOLDER/15min_build_bi_v1_with_op.py']
[2018-06-12 20:15:52,304] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,304] {driver.py:120} INFO - Generating grammar
tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2018-06-12 20:15:52,325] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,325] {driver.py:120} INFO - Generating grammar
tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2018-06-12 20:15:52,549] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,548] {configuration.py:206} WARNING - section/key
[celery/celery_ssl_active] not found in config
[2018-06-12 20:15:52,549] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,549] {default_celery.py:41} WARNING - Celery
Executor will run without SSL
[2018-06-12 20:15:52,550] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,550] {__init__.py:45} INFO - Using executor
CeleryExecutor
[2018-06-12 20:15:52,598] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,598] {models.py:189} INFO - Filling up the DagBag
from /home/bi_etl_user/bi-etl/airflow/dags/15min_build_bi_v1_with_op.py
[2018-06-12 20:15:52,661] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,661] {pagerduty_plugin.py:49} WARNING - Pagerduty
Events API v1 will be deprecated on October 19, 2018. Consider
upgrading to v2.
[2018-06-12 20:15:52,772] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,772] {bash_operator.py:70} INFO - Tmp dir root
location:
[2018-06-12 20:15:52,772] {base_task_runner.py:98} INFO - Subtask:  /tmp
[2018-06-12 20:15:52,773] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,773] {bash_operator.py:80} INFO - Temporary
script location:
/tmp/airflowtmp3T0rTd//tmp/airflowtmp3T0rTd/15min_bi_productsetb4RO
[2018-06-12 20:15:52,773] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,773] {bash_operator.py:88} INFO - Running
command:  python $DEPLOY_PATH/bi-etl/scripts/build_table.py
--index="$DEPLOY_PATH/bi-etl/load_bi/modules/bi_products_indexes.sql"
--target-system="bi_data" --target-table="bi_products"
--file="$DEPLOY_PATH/bi-etl/load_bi/modules/bi_products.sql"
[2018-06-12 20:15:52,777] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:15:52,777] {bash_operator.py:97} INFO - Output:
[2018-06-12 20:16:22,830] {helpers.py:233} INFO - Terminating
descendant processes of ['/usr/bin/python /usr/local/bin/airflow run
15min_build_bi_v1 15min_bi_products 2018-06-12T20:00:00 --job_id
254272 --raw -sd DAGS_FOLDER/15min_build_bi_v1_with_op.py'] PID: 10544
[2018-06-12 20:16:22,830] {helpers.py:237} INFO - Terminating
descendant process ['bash',
'/tmp/airflowtmp3T0rTd/15min_bi_productsetb4RO'] PID: 10624
[2018-06-12 20:16:22,836] {helpers.py:195} ERROR -
[2018-06-12 20:16:22,837] {helpers.py:196} INFO - Killed process 10624
with signal 15
[2018-06-12 20:16:22,837] {helpers.py:237} INFO - Terminating
descendant process ['python',
'/home/bi_etl_user/bi-etl/scripts/build_table.py',
'--index=/home/bi_etl_user/bi-etl/load_bi/modules/bi_products_indexes.sql',
'--target-system=bi_data', '--target-table=bi_products',
'--file=/home/bi_etl_user/bi-etl/load_bi/modules/bi_products.sql']
PID: 10625
[2018-06-12 20:16:22,842] {helpers.py:195} ERROR -
[2018-06-12 20:16:22,842] {helpers.py:196} INFO - Killed process 10625
with signal 15
[2018-06-12 20:16:22,842] {helpers.py:237} INFO - Terminating
descendant process ['/bin/sh', '-c', 'psql -h
bi-data-master2.cdm3aqbdbqi4.us-east-1.rds.amazonaws.com -U
bi_etl_user -p 5432 -d flexport_bi -v ON_ERROR_STOP=true  -f
"/data/sql_run/bi_products_index_schema_1528834552.sql" '] PID: 11299
[2018-06-12 20:16:22,845] {base_task_runner.py:98} INFO - Subtask:
[2018-06-12 20:16:22,844] {bash_operator.py:105} INFO - Command exited
with return code -15
[2018-06-12 20:16:22,847] {helpers.py:195} ERROR -
[2018-06-12 20:16:22,848] {helpers.py:196} INFO - Killed process 11299
with signal 15
[2018-06-12 20:16:22,848] {helpers.py:237} INFO - Terminating
descendant process ['/usr/lib/postgresql/9.5/bin/psql', '-h',
'bi-data-master2.cdm3aqbdbqi4.us-east-1.rds.amazonaws.com', '-U',
'bi_etl_user', '-p', '5432', '-d', 'flexport_bi', '-v',
'ON_ERROR_STOP=true', '-f',
'/data/sql_run/bi_products_index_schema_1528834552.sql'] PID: 11300
[2018-06-12 20:16:22,853] {helpers.py:195} ERROR -
[2018-06-12 20:16:22,853] {helpers.py:196} INFO - Killed process 11300
with signal 15
[2018-06-12 20:16:22,853] {helpers.py:242} INFO - Waiting up to 60s
for processes to exit...
[2018-06-12 20:16:22,854] {helpers.py:245} INFO - Done waiting
[2018-06-12 20:16:22,865] {base_task_runner.py:98} INFO - Subtask:
Traceback (most recent call last):
[2018-06-12 20:16:22,866] {base_task_runner.py:98} INFO - Subtask:
File "/usr/local/bin/airflow", line 27, in <module>
[2018-06-12 20:16:22,866] {base_task_runner.py:98} INFO - Subtask:
args.func(args)
[2018-06-12 20:16:22,866] {base_task_runner.py:98} INFO - Subtask:
File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line
392, in run
[2018-06-12 20:16:22,867] {base_task_runner.py:98} INFO - Subtask:
pool=args.pool,
[2018-06-12 20:16:22,867] {base_task_runner.py:98} INFO - Subtask:
File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py",
line 50, in wrapper
[2018-06-12 20:16:22,867] {base_task_runner.py:98} INFO - Subtask:
result = func(*args, **kwargs)
[2018-06-12 20:16:22,868] {base_task_runner.py:98} INFO - Subtask:
File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line
1488, in _run_raw_task
[2018-06-12 20:16:22,868] {base_task_runner.py:98} INFO - Subtask:
result = task_copy.execute(context=context)
[2018-06-12 20:16:22,869] {base_task_runner.py:98} INFO - Subtask:
File "/usr/local/lib/python2.7/dist-packages/airflow/operators/bash_operator.py",
line 109, in execute
[2018-06-12 20:16:22,869] {base_task_runner.py:98} INFO - Subtask:
raise AirflowException("Bash command failed")
[2018-06-12 20:16:22,869] {base_task_runner.py:98} INFO - Subtask:
airflow.exceptions.AirflowException: Bash command failed


Thanks!

Jim
```