airflow.providers.databricks.operators.databricks_repos
¶
This module contains Databricks operators.
Module Contents¶
Classes¶
Creates, and optionally checks out, a Databricks Repo using the POST api/2.0/repos API endpoint. |
|
Updates specified repository to a given branch or tag using the PATCH api/2.0/repos API endpoint. |
|
Deletes specified repository using the DELETE api/2.0/repos API endpoint. |
- class airflow.providers.databricks.operators.databricks_repos.DatabricksReposCreateOperator(*, git_url, git_provider=None, branch=None, tag=None, repo_path=None, ignore_existing_repo=False, databricks_conn_id='databricks_default', databricks_retry_limit=3, databricks_retry_delay=1, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Creates, and optionally checks out, a Databricks Repo using the POST api/2.0/repos API endpoint.
- Parameters
git_url (str) – Required HTTPS URL of a Git repository
git_provider (str | None) – Optional name of Git provider. Must be provided if we can’t guess its name from URL.
repo_path (str | None) – optional path for a repository. Must be in the format
/Repos/{folder}/{repo-name}
. If not specified, it will be created in the user’s directory.branch (str | None) – optional name of branch to check out.
tag (str | None) – optional name of tag to checkout.
ignore_existing_repo (bool) – don’t throw exception if repository with given path already exists.
databricks_conn_id (str) – Reference to the Databricks connection. By default and in the common case this will be
databricks_default
. To use token based authentication, provide the keytoken
in the extra field for the connection and create the keyhost
and leave thehost
field empty. (templated)databricks_retry_limit (int) – Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1.
databricks_retry_delay (int) – Number of seconds to wait between retries (it might be a floating point number).
- template_fields: collections.abc.Sequence[str] = ('repo_path', 'tag', 'branch', 'databricks_conn_id')[source]¶
- class airflow.providers.databricks.operators.databricks_repos.DatabricksReposUpdateOperator(*, branch=None, tag=None, repo_id=None, repo_path=None, databricks_conn_id='databricks_default', databricks_retry_limit=3, databricks_retry_delay=1, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Updates specified repository to a given branch or tag using the PATCH api/2.0/repos API endpoint.
See: https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/update-repo
- Parameters
branch (str | None) – optional name of branch to update to. Should be specified if
tag
is omittedtag (str | None) – optional name of tag to update to. Should be specified if
branch
is omittedrepo_id (str | None) – optional ID of existing repository. Should be specified if
repo_path
is omittedrepo_path (str | None) – optional path of existing repository. Should be specified if
repo_id
is omitteddatabricks_conn_id (str) – Reference to the Databricks connection. By default and in the common case this will be
databricks_default
. To use token based authentication, provide the keytoken
in the extra field for the connection and create the keyhost
and leave thehost
field empty. (templated)databricks_retry_limit (int) – Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1.
databricks_retry_delay (int) – Number of seconds to wait between retries (it might be a floating point number).
- template_fields: collections.abc.Sequence[str] = ('repo_path', 'tag', 'branch', 'databricks_conn_id')[source]¶
- class airflow.providers.databricks.operators.databricks_repos.DatabricksReposDeleteOperator(*, repo_id=None, repo_path=None, databricks_conn_id='databricks_default', databricks_retry_limit=3, databricks_retry_delay=1, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Deletes specified repository using the DELETE api/2.0/repos API endpoint.
See: https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/delete-repo
- Parameters
repo_id (str | None) – optional ID of existing repository. Should be specified if
repo_path
is omittedrepo_path (str | None) – optional path of existing repository. Should be specified if
repo_id
is omitteddatabricks_conn_id (str) – Reference to the Databricks connection. By default and in the common case this will be
databricks_default
. To use token based authentication, provide the keytoken
in the extra field for the connection and create the keyhost
and leave thehost
field empty. (templated)databricks_retry_limit (int) – Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1.
databricks_retry_delay (int) – Number of seconds to wait between retries (it might be a floating point number).
- template_fields: collections.abc.Sequence[str] = ('repo_path', 'databricks_conn_id')[source]¶