airflow.models.dagbag

Module Contents

Classes

FileLoadStat

Information about single file.

DagBag

A dagbag is a collection of dags, parsed out of a folder tree and has high level configuration settings.

DagPriorityParsingRequest

Model to store the dag parsing requests that will be prioritized when parsing files.

Functions

generate_md5_hash(context)

class airflow.models.dagbag.FileLoadStat[source]

Bases: NamedTuple

Information about single file.

Parameters
  • file – Loaded file.

  • duration – Time spent on process file.

  • dag_num – Total number of DAGs loaded in this file.

  • task_num – Total number of Tasks loaded in this file.

  • dags – DAGs names loaded in this file.

  • warning_num – Total number of warnings captured from processing this file.

file: str[source]
duration: datetime.timedelta[source]
dag_num: int[source]
task_num: int[source]
dags: str[source]
warning_num: int[source]
class airflow.models.dagbag.DagBag(dag_folder=None, include_examples=NOTSET, safe_mode=NOTSET, read_dags_from_db=False, load_op_links=True, collect_dags=True)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

A dagbag is a collection of dags, parsed out of a folder tree and has high level configuration settings.

Some possible setting are database to use as a backend and what executor to use to fire off tasks. This makes it easier to run distinct environments for say production and development, tests, or for different teams or security profiles. What would have been system level settings are now dagbag level so that one system can run multiple, independent settings sets.

Parameters
  • dag_folder (str | pathlib.Path | None) – the folder to scan to find DAGs

  • include_examples (bool | airflow.utils.types.ArgNotSet) – whether to include the examples that ship with airflow or not

  • safe_mode (bool | airflow.utils.types.ArgNotSet) – when False, scans all python modules for dags. When True uses heuristics (files containing DAG and airflow strings) to filter python modules to scan for dags.

  • read_dags_from_db (bool) – Read DAGs from DB if True is passed. If False DAGs are read from python files.

  • load_op_links (bool) – Should the extra operator link be loaded via plugins when de-serializing the DAG? This flag is set to False in Scheduler so that Extra Operator links are not loaded to not run User code in Scheduler.

  • collect_dags (bool) – when True, collects dags during class initialization.

property dag_ids: list[str][source]

Get DAG ids.

Returns

a list of DAG IDs in this bag

Return type

list[str]

size()[source]
Returns

the amount of dags contained in this dagbag

Return type

int

get_dag(dag_id, session=None)[source]

Get the DAG out of the dictionary, and refreshes it if expired.

Parameters

dag_id – DAG ID

process_file(filepath, only_if_updated=True, safe_mode=True)[source]

Given a path to a python module or zip file, import the module and look for dag objects within.

bag_dag(dag)[source]

Add the DAG into the bag.

Raises

AirflowDagCycleException if a cycle is detected in this dag or its subdags.

Raises

AirflowDagDuplicatedIdException if this dag or its subdags already exists in the bag.

collect_dags(dag_folder=None, only_if_updated=True, include_examples=conf.getboolean('core', 'LOAD_EXAMPLES'), safe_mode=conf.getboolean('core', 'DAG_DISCOVERY_SAFE_MODE'))[source]

Look for python modules in a given path, import them, and add them to the dagbag collection.

Note that if a .airflowignore file is found while processing the directory, it will behave much like a .gitignore, ignoring files that match any of the patterns specified in the file.

Note: The patterns in .airflowignore are interpreted as either un-anchored regexes or gitignore-like glob expressions, depending on the DAG_IGNORE_FILE_SYNTAX configuration parameter.

collect_dags_from_db()[source]

Collect DAGs from database.

dagbag_report()[source]

Print a report around DagBag loading stats.

sync_to_db(processor_subdir=None, session=NEW_SESSION)[source]
airflow.models.dagbag.generate_md5_hash(context)[source]
class airflow.models.dagbag.DagPriorityParsingRequest(fileloc)[source]

Bases: airflow.models.base.Base

Model to store the dag parsing requests that will be prioritized when parsing files.

__tablename__ = 'dag_priority_parsing_request'[source]
id[source]
fileloc[source]
__repr__()[source]

Return repr(self).

Was this entry helpful?