Masking sensitive data
Airflow will by default mask Connection passwords, sensitive Variables, and keys from a Connection’s extra (JSON) field whose names contain one or more of the sensitive keywords when they appear in Task logs, in the Variables UI, and in the Rendered fields views of the UI. Keys in the extra JSON that do not include any of these sensitive keywords will not be redacted automatically.
It does this by looking for the specific value appearing anywhere in your output. This means that if you
have a connection with a password of a, then every instance of the letter a in your logs will be replaced
with ***.
To disable masking you can set hide_sensitive_var_conn_fields to false.
The automatic masking is triggered by Connection or Variable access. This means that if you pass a sensitive value via XCom or any other side-channel it will not be masked when printed in the downstream task.
Sensitive field names
When masking is enabled, Airflow will always mask the password field of every Connection that is accessed by a task.
It will also mask the value of an Airflow Variable, rendered template dictionaries, XCom dictionaries or the field of a Connection’s extra JSON blob if the Variable name or field name contains any of the known-sensitive keywords.
Default Sensitive Keywords:
access_token, api_key, apikey, authorization, passphrase, passwd, password,
private_key, secret, token, keyfile_dict, service_account.
This list can also be extended using the environment variable AIRFLOW__CORE__SENSITIVE_VAR_CONN_NAMES:
[core]
sensitive_var_conn_names = comma,separated,sensitive,names
Examples of Masking Behavior:
Source |
Key / Variable Name |
Matching Keyword |
Masking Scope |
|---|---|---|---|
Connection Extra |
google_keyfile_dict |
keyfile_dict |
Everywhere (Logs, Rendered Templates, UI) |
Connection Extra |
hello |
None |
Not Masked |
Variable |
service_account |
service_account |
Everywhere (Logs, Rendered Templates, UI) |
Variable |
test_keyfile_dict |
keyfile_dict |
Variables UI Only |
Adding your own masks
If you want to mask an additional secret that is not already masked by one of the above methods, you can do it in
your Dag file or operator’s execute function using the mask_secret function. For example:
@task
def my_func():
from airflow.sdk.log import mask_secret
mask_secret("custom_value")
...
or
class MyOperator(BaseOperator):
def execute(self, context):
from airflow.sdk.log import mask_secret
mask_secret("custom_value")
...
The mask must be set before any log/output is produced to have any effect.
NOT masking when using environment variables
When you are using some operators - for example airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator,
you might be tempted to pass secrets via environment variables. This is very bad practice because the environment
variables are visible to anyone who has access to see the environment of the process - such secrets passed by
environment variables will NOT be masked by Airflow.
If you need to pass secrets to the KubernetesPodOperator, you should use native Kubernetes secrets or use Airflow Connection or Variables to retrieve the secrets dynamically.