airflow.providers.google.cloud.hooks.dataproc_metastore

This module contains a Google Cloud Dataproc Metastore hook.

Module Contents

Classes

DataprocMetastoreHook

Hook for Google Cloud Dataproc Metastore APIs.

class airflow.providers.google.cloud.hooks.dataproc_metastore.DataprocMetastoreHook(gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseHook

Hook for Google Cloud Dataproc Metastore APIs.

get_dataproc_metastore_client()[source]

Return DataprocMetastoreClient.

get_dataproc_metastore_client_v1beta()[source]

Return DataprocMetastoreClient (from v1 beta).

wait_for_operation(timeout, operation)[source]

Wait for long-lasting operation to complete.

create_backup(project_id, region, service_id, backup, backup_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Create a new backup in a given project and location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup (dict[Any, Any] | google.cloud.metastore_v1.types.Backup) –

    Required. The backup to create. The name field is ignored. The ID of the created backup must be provided in the request’s backup_id field.

    This corresponds to the backup field on the request instance; if request is provided, this should not be set.

  • backup_id (str) –

    Required. The ID of the backup, which is used as the final component of the backup’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the backup_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

create_metadata_import(project_id, region, service_id, metadata_import, metadata_import_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Create a new MetadataImport in a given project and location.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • metadata_import (dict | google.cloud.metastore_v1.types.MetadataImport) –

    Required. The metadata import to create. The name field is ignored. The ID of the created metadata import must be provided in the request’s metadata_import_id field.

    This corresponds to the metadata_import field on the request instance; if request is provided, this should not be set.

  • metadata_import_id (str) –

    Required. The ID of the metadata import, which is used as the final component of the metadata import’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the metadata_import_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

create_service(region, project_id, service, service_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Create a metastore service in a project and location.

Parameters
  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • service (dict | google.cloud.metastore_v1.types.Service) –

    Required. The Metastore service to create. The name field is ignored. The ID of the created metastore service must be provided in the request’s service_id field.

    This corresponds to the service field on the request instance; if request is provided, this should not be set.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

delete_backup(project_id, region, service_id, backup_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Delete a single backup.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup_id (str) –

    Required. The ID of the backup, which is used as the final component of the backup’s name. This value must be between 1 and 64 characters long, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the backup_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

delete_service(project_id, region, service_id, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Delete a single service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

export_metadata(destination_gcs_folder, project_id, region, service_id, request_id=None, database_dump_type=None, retry=DEFAULT, timeout=None, metadata=())[source]

Export metadata from a service.

Parameters
  • destination_gcs_folder (str) – A Cloud Storage URI of a folder, in the format gs://<bucket_name>/<path_inside_bucket>. A sub-folder <export_folder> containing exported files will be created below it.

  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • database_dump_type (google.cloud.metastore_v1.types.metastore.DatabaseDumpSpec | None) – Optional. The type of the database dump. If unspecified, defaults to MYSQL.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

get_service(project_id, region, service_id, retry=DEFAULT, timeout=None, metadata=())[source]

Get the details of a single service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

get_backup(project_id, region, service_id, backup_id, retry=DEFAULT, timeout=None, metadata=())[source]

Get backup from a service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup_id (str) – Required. The ID of the metastore service backup to restore from

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

list_backups(project_id, region, service_id, page_size=None, page_token=None, filter=None, order_by=None, retry=DEFAULT, timeout=None, metadata=())[source]

List backups in a service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • page_size (int | None) – Optional. The maximum number of backups to return. The response may contain less than the maximum number. If unspecified, no more than 500 backups are returned. The maximum value is 1000; values above 1000 are changed to 1000.

  • page_token (str | None) – Optional. A page token, received from a previous [DataprocMetastore.ListBackups][google.cloud.metastore.v1.DataprocMetastore.ListBackups] call. Provide this token to retrieve the subsequent page. To retrieve the first page, supply an empty page token. When paginating, other parameters provided to [DataprocMetastore.ListBackups][google.cloud.metastore.v1.DataprocMetastore.ListBackups] must match the call that provided the page token.

  • filter (str | None) – Optional. The filter to apply to list results.

  • order_by (str | None) – Optional. Specify the ordering of results as described in Sorting Order. If not specified, the results will be sorted in the default order.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

restore_service(project_id, region, service_id, backup_project_id, backup_region, backup_service_id, backup_id, restore_type=None, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Restores a service from a backup.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • backup_project_id (str) – Required. The ID of the Google Cloud project that the metastore service backup to restore from.

  • backup_region (str) – Required. The ID of the Google Cloud region that the metastore service backup to restore from.

  • backup_service_id (str) – Required. The ID of the metastore service backup to restore from, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

  • backup_id (str) – Required. The ID of the metastore service backup to restore from

  • restore_type (google.cloud.metastore_v1.types.metastore.Restore | None) – Optional. The type of restore. If unspecified, defaults to METADATA_ONLY

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

update_service(project_id, region, service_id, service, update_mask, request_id=None, retry=DEFAULT, timeout=None, metadata=())[source]

Update the parameters of a single service.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • service_id (str) –

    Required. The ID of the metastore service, which is used as the final component of the metastore service’s name. This value must be between 2 and 63 characters long inclusive, begin with a letter, end with a letter or number, and consist of alphanumeric ASCII characters or hyphens.

    This corresponds to the service_id field on the request instance; if request is provided, this should not be set.

  • service (dict | google.cloud.metastore_v1.types.Service) –

    Required. The metastore service to update. The server only merges fields in the service if they are specified in update_mask.

    The metastore service’s name field is used to identify the metastore service to be updated.

    This corresponds to the service field on the request instance; if request is provided, this should not be set.

  • update_mask (google.protobuf.field_mask_pb2.FieldMask) –

    Required. A field mask used to specify the fields to be overwritten in the metastore service resource by the update. Fields specified in the update_mask are relative to the resource (not to the full request). A field is overwritten if it is in the mask.

    This corresponds to the update_mask field on the request instance; if request is provided, this should not be set.

  • request_id (str | None) – Optional. A unique id used to identify the request.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.

  • timeout (float | None) – The timeout for this request.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.

list_hive_partitions(project_id, service_id, region, table, partition_names=None)[source]

List Hive partitions.

Parameters
  • project_id (str) – Optional. The ID of a dbt Cloud project.

  • service_id (str) – Required. Dataproc Metastore service id.

  • region (str) – Required. The ID of the Google Cloud region that the service belongs to.

  • table (str) – Required. Name of the partitioned table

  • partition_names (list[str] | None) – Optional. List of table partitions to wait for. A name of a partition should look like “ds=1”, or “a=1/b=2” in case of multiple partitions. Note that you cannot use logical or comparison operators as in HivePartitionSensor. If not specified then the sensor will wait for at least one partition regardless its name.

Was this entry helpful?