Masking sensitive data

Airflow will by default mask Connection passwords and sensitive Variables and keys from a Connection’s extra (JSON) field when they appear in Task logs, in the Variable and in the Rendered fields views of the UI.

It does this by looking for the specific value appearing anywhere in your output. This means that if you have a connection with a password of a, then every instance of the letter a in your logs will be replaced with ***.

To disable masking you can set hide_sensitive_var_conn_fields to false.

The automatic masking is triggered by Connection or Variable access. This means that if you pass a sensitive value via XCom or any other side-channel it will not be masked when printed in the downstream task.

Sensitive field names

When masking is enabled, Airflow will always mask the password field of every Connection that is accessed by a task.

It will also mask the value of a Variable, rendered template dictionaries, XCom dictionaries or the field of a Connection’s extra JSON blob if the name contains any words in (‘access_token’, ‘api_key’, ‘apikey’, ‘authorization’, ‘passphrase’, ‘passwd’, ‘password’, ‘private_key’, ‘secret’, ‘token’). This list can also be extended:

[core]
sensitive_var_conn_names = comma,separated,sensitive,names

Adding your own masks

If you want to mask an additional secret that is not already masked by one of the above methods, you can do it in your DAG file or operator’s execute function using the mask_secret function. For example:

@task
def my_func():
    from airflow.utils.log.secrets_masker import mask_secret

    mask_secret("custom_value")

    ...

or

class MyOperator(BaseOperator):
    def execute(self, context):
        from airflow.utils.log.secrets_masker import mask_secret

        mask_secret("custom_value")

        ...

The mask must be set before any log/output is produced to have any effect.

NOT masking when using environment variables

When you are using some operators - for example airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator, you might be tempted to pass secrets via environment variables. This is very bad practice because the environment variables are visible to anyone who has access to see the environment of the process - such secrets passed by environment variables will NOT be masked by Airflow.

If you need to pass secrets to the KubernetesPodOperator, you should use native Kubernetes secrets or use Airflow Connection or Variables to retrieve the secrets dynamically.

Was this entry helpful?