airflow.providers.google.cloud.hooks.dataproc_metastore
¶
This module contains a Google Cloud Dataproc Metastore hook.
Module Contents¶
Classes¶
Hook for Google Cloud Dataproc Metastore APIs. |
- class airflow.providers.google.cloud.hooks.dataproc_metastore.DataprocMetastoreHook(gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.common.hooks.base_google.GoogleBaseHook
Hook for Google Cloud Dataproc Metastore APIs.
- create_backup(project_id, region, service_id, backup, backup_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Create a new backup in a given project and location.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.backup (dict[Any, Any] | google.cloud.metastore_v1.types.Backup) –
Required. The backup to create. The
name
field is ignored. The ID of the created backup must be provided in the request’sbackup_id
field.This corresponds to the
backup
field on therequest
instance; ifrequest
is provided, this should not be set.backup_id (str) –
Required. The ID of the backup, which is used as the final component of the backup’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
backup_id
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- create_metadata_import(project_id, region, service_id, metadata_import, metadata_import_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Create a new MetadataImport in a given project and location.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.metadata_import (dict | google.cloud.metastore_v1.types.MetadataImport) –
Required. The metadata import to create. The
name
field is ignored. The ID of the created metadata import must be provided in the request’smetadata_import_id
field.This corresponds to the
metadata_import
field on therequest
instance; ifrequest
is provided, this should not be set.metadata_import_id (str) –
Required. The ID of the metadata import, which is used as the final component of the metadata import’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
metadata_import_id
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- create_service(region, project_id, service, service_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Create a metastore service in a project and location.
- Parameters
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
service (dict | google.cloud.metastore_v1.types.Service) –
Required. The Metastore service to create. The
name
field is ignored. The ID of the created metastore service must be provided in the request’sservice_id
field.This corresponds to the
service
field on therequest
instance; ifrequest
is provided, this should not be set.service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- delete_backup(project_id, region, service_id, backup_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Delete a single backup.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.backup_id (str) –
Required. The ID of the backup, which is used as the final component of the backup’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
backup_id
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- delete_service(project_id, region, service_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Delete a single service.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- export_metadata(destination_gcs_folder, project_id, region, service_id, request_id=None, database_dump_type=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Export metadata from a service.
- Parameters
destination_gcs_folder (str) – A Cloud Storage URI of a folder, in the format
gs://<bucket_name>/<path_inside_bucket>
. A sub-folder<export_folder>
containing exported files will be created below it.project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
database_dump_type (google.cloud.metastore_v1.types.metastore.DatabaseDumpSpec | None) – Optional. The type of the database dump. If unspecified, defaults to
MYSQL
.retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- get_service(project_id, region, service_id, retry=DEFAULT, timeout=None, metadata=())[source]¶
Get the details of a single service.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- get_backup(project_id, region, service_id, backup_id, retry=DEFAULT, timeout=None, metadata=())[source]¶
Get backup from a service.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.backup_id (str) – Required. The ID of the metastore service backup to restore from
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- list_backups(project_id, region, service_id, page_size=None, page_token=None, filter=None, order_by=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
List backups in a service.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.page_size (int | None) – Optional. The maximum number of backups to return. The response may contain less than the maximum number. If unspecified, no more than 500 backups are returned. The maximum value is 1000; values above 1000 are changed to 1000.
page_token (str | None) – Optional. A page token, received from a previous [DataprocMetastore.ListBackups][google.cloud.metastore.v1.DataprocMetastore.ListBackups] call. Provide this token to retrieve the subsequent page. To retrieve the first page, supply an empty page token. When paginating, other parameters provided to [DataprocMetastore.ListBackups][google.cloud.metastore.v1.DataprocMetastore.ListBackups] must match the call that provided the page token.
filter (str | None) – Optional. The filter to apply to list results.
order_by (str | None) – Optional. Specify the ordering of results as described in Sorting Order. If not specified, the results will be sorted in the default order.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- restore_service(project_id, region, service_id, backup_project_id, backup_region, backup_service_id, backup_id, restore_type=None, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Restores a service from a backup.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.backup_project_id (str) – Required. The ID of the Google Cloud project that the metastore service backup to restore from.
backup_region (str) – Required. The ID of the Google Cloud region that the metastore service backup to restore from.
backup_service_id (str) – Required. The ID of the metastore service backup to restore from, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
backup_id (str) – Required. The ID of the metastore service backup to restore from
restore_type (google.cloud.metastore_v1.types.metastore.Restore | None) – Optional. The type of restore. If unspecified, defaults to
METADATA_ONLY
request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- update_service(project_id, region, service_id, service, update_mask, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]¶
Update the parameters of a single service.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
service_id (str) –
Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.
This corresponds to the
service_id
field on therequest
instance; ifrequest
is provided, this should not be set.service (dict | google.cloud.metastore_v1.types.Service) –
Required. The metastore service to update. The server only merges fields in the service if they are specified in
update_mask
.The metastore service’s
name
field is used to identify the metastore service to be updated.This corresponds to the
service
field on therequest
instance; ifrequest
is provided, this should not be set.update_mask (google.protobuf.field_mask_pb2.FieldMask) –
Required. A field mask used to specify the fields to be overwritten in the metastore service resource by the update. Fields specified in the
update_mask
are relative to the resource (not to the full request). A field is overwritten if it is in the mask.This corresponds to the
update_mask
field on therequest
instance; ifrequest
is provided, this should not be set.request_id (str | None) – Optional. A unique id used to identify the request.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
- list_hive_partitions(project_id, service_id, region, table, partition_names=None)[source]¶
List Hive partitions.
- Parameters
project_id (str) – Optional. The ID of a dbt Cloud project.
service_id (str) – Required. Dataproc Metastore service id.
region (str) – Required. The ID of the Google Cloud region that the service belongs to.
table (str) – Required. Name of the partitioned table
partition_names (list[str] | None) – Optional. List of table partitions to wait for. A name of a partition should look like “ds=1”, or “a=1/b=2” in case of multiple partitions. Note that you cannot use logical or comparison operators as in HivePartitionSensor. If not specified then the sensor will wait for at least one partition regardless its name.