airflow.providers.databricks.operators.databricks_workflow

Module Contents

Classes

WorkflowRunMetadata

Metadata for a Databricks workflow run.

DatabricksWorkflowTaskGroup

A task group that takes a list of tasks and creates a databricks workflow.

class airflow.providers.databricks.operators.databricks_workflow.WorkflowRunMetadata[source]

Metadata for a Databricks workflow run.

Parameters
  • run_id – The ID of the Databricks workflow run.

  • job_id – The ID of the Databricks workflow job.

  • conn_id – The connection ID used to connect to Databricks.

conn_id: str[source]
job_id: str[source]
run_id: int[source]
class airflow.providers.databricks.operators.databricks_workflow.DatabricksWorkflowTaskGroup(databricks_conn_id, existing_clusters=None, extra_job_params=None, jar_params=None, job_clusters=None, max_concurrent_runs=1, notebook_packages=None, notebook_params=None, python_params=None, spark_submit_params=None, **kwargs)[source]

Bases: airflow.utils.task_group.TaskGroup

A task group that takes a list of tasks and creates a databricks workflow.

The DatabricksWorkflowTaskGroup takes a list of tasks and creates a databricks workflow based on the metadata produced by those tasks. For a task to be eligible for this TaskGroup, it must contain the _convert_to_databricks_workflow_task method. If any tasks do not contain this method then the Taskgroup will raise an error at parse time.

See also

For more information on how to use this operator, take a look at the guide: DatabricksWorkflowTaskGroup

Parameters
  • databricks_conn_id (str) – The name of the databricks connection to use.

  • existing_clusters (list[str] | None) – A list of existing clusters to use for this workflow.

  • extra_job_params (dict[str, Any] | None) – A dictionary containing properties which will override the default Databricks Workflow Job definitions.

  • jar_params (list[str] | None) – A list of jar parameters to pass to the workflow. These parameters will be passed to all jar tasks in the workflow.

  • job_clusters (list[dict] | None) – A list of job clusters to use for this workflow.

  • max_concurrent_runs (int) – The maximum number of concurrent runs for this workflow.

  • notebook_packages (list[dict[str, Any]] | None) – A list of dictionary of Python packages to be installed. Packages defined at the workflow task group level are installed for each of the notebook tasks under it. And packages defined at the notebook task level are installed specific for the notebook task.

  • notebook_params (dict | None) – A dictionary of notebook parameters to pass to the workflow. These parameters will be passed to all notebook tasks in the workflow.

  • python_params (list | None) – A list of python parameters to pass to the workflow. These parameters will be passed to all python tasks in the workflow.

  • spark_submit_params (list | None) – A list of spark submit parameters to pass to the workflow. These parameters will be passed to all spark submit tasks.

is_databricks = True[source]
__exit__(_type, _value, _tb)[source]

Exit the context manager and add tasks to a single _CreateDatabricksWorkflowOperator.

Was this entry helpful?