Initial Design for Supporting fine-grained Connection encryption [Feedback Requested]
In support of adding fine-grained Connection encryption (Jira Issue:
https://issues.apache.org/jira/browse/AIRFLOW-2062) I wanted to gather
feedback on a proposed design, as it affects a few different Airflow
components. A full design doc is coming next week.
The end goal is to allow per-Connection encryption (as opposed the global
fernet key) to support providing containerized tasks with independent
credentials to limit access, and to enable integration with Key Management
At a high level, Connection objects will be augmented with 2 additional
fields: `KMS_type` and `KMS_extras`, which are modeled (somewhat) after the
existing `conn_type` and `extras` fields. Each connection can be flagged as
"independently encrypted", which then prompts the user to pick a KMS (from
a predefined list, like Connection type) and enter the relevant
authentication and metadata that KMS requires to operate (mirroring how
choosing a Connection type results in additional configuration).
The credentials to authenticate with the KMS can either be manually placed
(like some key files for Connections are now) or, in the case of
containerized workers, injected as a key file (through file system mapping)
or environment variable on a per-worker basis. These changes are primarily
in support of the second (containerized workers) model.
When creating an encrypted Connection, Airflow will generate a
cryptographic key (likely AES, possibly a separate fernet key) for that
connection and encrypt the Connection fields. It will then encrypt that key
(K_conn) using the KMS.
KMS communication happens through KMSClients, which are implemented very
similarly to Connection types and Hooks, with a mapping from KMS_type to a
particular Client. New clients can be added by the community (as with
hooks/Connection Types). The API for a KMSClient is simple: Encryption and
Decryption. The `encrypt` method would take in K_conn and the configuration
data, encrypt K_conn through the KMS, and return JSON to be stored in the
KMS_extra field. `decrypt` is passed this KMS_extra JSON, decrypts K_conn
though the KMS, and returns K_conn to be used to decrypt the Connection
data. After both operations, K_conn is purged from memory.
Decryption would be implemented where the Connection is loaded from the
database or environment. This makes the presence of per-Connection
encryption transparent to any calling code, much like the fernet encryption
As mentioned, all feedback and criticism is welcome to try to improve this