airflow.providers.google.cloud.operators.translate
¶
This module contains Google Translate operators.
Module Contents¶
Classes¶
Translate a string or list of strings. |
|
Translate text content of moderate amount, for larger volumes of text please use the TranslateTextBatchOperator. |
|
Translate large volumes of text content, by the inputs provided. |
|
Create a Google Cloud Translate dataset. |
|
Get a list of native Google Cloud Translation datasets in a project. |
|
Import data to the translation dataset. |
|
Delete translation dataset and all of its contents. |
|
Creates a Google Cloud Translate model. |
|
Get a list of native Google Cloud Translation models in a project. |
|
Delete translation model and all of its contents. |
|
Translate document provided. |
|
Translate documents provided via input and output configurations. |
|
Creates a Google Cloud Translation Glossary. |
|
Update glossary item with values provided. |
|
Get a list of translation glossaries in a project. |
|
Delete a Google Cloud Translation Glossary. |
- class airflow.providers.google.cloud.operators.translate.CloudTranslateTextOperator(*, values, target_language, format_, source_language, model, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Translate a string or list of strings.
See also
For more information on how to use this operator, take a look at the guide: CloudTranslateTextOperator
See https://cloud.google.com/translate/docs/translating-text
Execute method returns str or list.
This is a list of dictionaries for each queried value. Each dictionary typically contains three keys (though not all will be present in all cases):
detectedSourceLanguage
: The detected language (as an ISO 639-1 language code) of the text.translatedText
: The translation of the text into the target language.input
: The corresponding input value.model
: The model used to translate the text.
If only a single value is passed, then only a single dictionary is set as the XCom return value.
- Parameters
values (list[str] | str) – String or list of strings to translate.
target_language (str) – The language to translate results into. This is required by the API.
format – (Optional) One of
text
orhtml
, to specify if the input text is plain text or HTML.source_language (str | None) – (Optional) The language of the text to be translated.
model (str) – (Optional) The model used to translate the text, such as
'base'
or'nmt'
.impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with the first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('values', 'target_language', 'format_', 'source_language', 'model', 'gcp_conn_id',...[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateTextOperator(*, contents, source_language_code=None, target_language_code, mime_type=None, location=None, project_id=PROVIDE_PROJECT_ID, model=None, transliteration_config=None, glossary_config=None, labels=None, timeout=DEFAULT, retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Translate text content of moderate amount, for larger volumes of text please use the TranslateTextBatchOperator.
Wraps the Google cloud Translate Text (Advanced) functionality. See https://cloud.google.com/translate/docs/advanced/translating-text-v3
- For more information on how to use this operator, take a look at the guide:
- Parameters
project_id (str) – Optional. The ID of the Google Cloud project that the service belongs to. If not provided default project_id is used.
location (str | None) – optional. The ID of the Google Cloud location that the service belongs to. if not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries.
contents (collections.abc.Sequence[str]) – Required. The sequence of content strings to be translated. Limited to 1024 items with 30_000 codepoints total recommended.
mime_type (str | None) – Optional. The format of the source text, If left blank, the MIME type defaults to “text/html”.
source_language_code (str | None) – Optional. The ISO-639 language code of the input text if known. If not specified, attempted to recognize automatically.
target_language_code (str) – Required. The ISO-639 language code to use for translation of the input text.
model (str | None) –
Optional. The
model
type requested for this translation. If not provided, the default Google model (NMT) will be used. The format depends on model type:AutoML Translation models:
projects/{project-number-or-id}/locations/{location-id}/models/{model-id}
General (built-in) models:
projects/{project-number-or-id}/locations/{location-id}/models/general/nmt
Translation LLM models:
projects/{project-number-or-id}/locations/{location-id}/models/general/translation-llm
For global (non-region) requests, use ‘global’
location-id
.glossary_config (google.cloud.translate_v3.types.TranslateTextGlossaryConfig | None) – Optional. Glossary to be applied.
transliteration_config (google.cloud.translate_v3.types.TransliterationConfig | None) – Optional. Transliteration to be applied.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('contents', 'target_language_code', 'mime_type', 'source_language_code', 'model',...[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateTextBatchOperator(*, project_id=PROVIDE_PROJECT_ID, location, target_language_codes, source_language_code, input_configs, output_config, models=None, glossaries=None, labels=None, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Translate large volumes of text content, by the inputs provided.
Wraps the Google cloud Translate Text (Advanced) functionality. See https://cloud.google.com/translate/docs/advanced/batch-translation
For more information on how to use this operator, take a look at the guide: TranslateTextBatchOperator.
- Parameters
project_id (str) – Optional. The ID of the Google Cloud project that the service belongs to. If not specified the hook project_id will be used.
location (str) – required. The ID of the Google Cloud location, (non-global) that the service belongs to.
source_language_code (str) – Required. Source language code.
target_language_codes (collections.abc.MutableSequence[str]) – Required. Up to 10 language codes allowed here.
input_configs (collections.abc.MutableSequence[google.cloud.translate_v3.types.InputConfig | dict]) – Required. Input configurations. The total number of files matched should be <=100. The total content size should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.
models (str | None) –
Optional. The models to use for translation. Map’s key is target language code. Map’s value is model name. Value can be a built-in general model, or an AutoML Translation model. The value format depends on model type:
AutoML Translation models:
projects/{project-number-or-id}/locations/{location-id}/models/{model-id}
General (built-in) models:
projects/{project-number-or-id}/locations/{location-id}/models/general/nmt
If the map is empty or a specific model is not requested for a language pair, then the default Google model (NMT) is used.
output_config (google.cloud.translate_v3.types.OutputConfig | dict) – Required. Output configuration.
glossaries (collections.abc.MutableMapping[str, google.cloud.translate_v3.types.TranslateTextGlossaryConfig] | None) – Optional. Glossaries to be applied for translation. It’s keyed by target language code.
labels (collections.abc.MutableMapping[str, str] | None) – Optional. The labels with user-defined metadata. See https://cloud.google.com/translate/docs/advanced/labels for more information.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('input_configs', 'target_language_codes', 'source_language_code', 'models', 'glossaries',...[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateCreateDatasetOperator(*, project_id=PROVIDE_PROJECT_ID, location, dataset, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Create a Google Cloud Translate dataset.
Creates a native translation dataset, using API V3. For more information on how to use this operator, take a look at the guide: TranslateCreateDatasetOperator.
- Parameters
dataset (dict | google.cloud.translate_v3.types.automl_translation.Dataset) – The dataset to create. If a dict is provided, it must correspond to the automl_translation.Dataset type.
project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('dataset', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateDatasetsListOperator(*, project_id=PROVIDE_PROJECT_ID, location, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Get a list of native Google Cloud Translation datasets in a project.
Get project’s list of native translation datasets, using API V3. For more information on how to use this operator, take a look at the guide: TranslateDatasetsListOperator.
- Parameters
project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateImportDataOperator(*, dataset_id, location, input_config, project_id=PROVIDE_PROJECT_ID, metadata=(), timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Import data to the translation dataset.
Loads data to the translation dataset, using API V3. For more information on how to use this operator, take a look at the guide: TranslateImportDataOperator.
- Parameters
dataset_id (str) – The dataset_id of target native dataset to import data to.
input_config (dict | google.cloud.translate_v3.types.DatasetInputConfig) – The desired input location of translations language pairs file. If a dict provided, must follow the structure of DatasetInputConfig. If a dict is provided, it must be of the same form as the protobuf message InputConfig.
project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('dataset_id', 'input_config', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateDeleteDatasetOperator(*, dataset_id, location, project_id=PROVIDE_PROJECT_ID, metadata=(), timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Delete translation dataset and all of its contents.
Deletes the translation dataset and it’s data, using API V3. For more information on how to use this operator, take a look at the guide: TranslateDeleteDatasetOperator.
- Parameters
dataset_id (str) – The dataset_id of target native dataset to be deleted.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('dataset_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateCreateModelOperator(*, project_id=PROVIDE_PROJECT_ID, location, dataset_id, display_name, timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', metadata=(), impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Creates a Google Cloud Translate model.
Creates a native translation model, using API V3. For more information on how to use this operator, take a look at the guide: TranslateCreateModelOperator.
- Parameters
dataset_id (str) – The dataset id used for model training.
project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('dataset_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateModelsListOperator(*, project_id=PROVIDE_PROJECT_ID, location, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Get a list of native Google Cloud Translation models in a project.
Get project’s list of native translation models, using API V3. For more information on how to use this operator, take a look at the guide: TranslateModelsListOperator.
- Parameters
project_id (str) – ID of the Google Cloud project where dataset is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateDeleteModelOperator(*, model_id, location, project_id=PROVIDE_PROJECT_ID, metadata=(), timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Delete translation model and all of its contents.
Deletes the translation model and it’s data, using API V3. For more information on how to use this operator, take a look at the guide: TranslateDeleteModelOperator.
- Parameters
model_id (str) – The model_id of target native model to be deleted.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('model_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateDocumentOperator(*, location=None, project_id=PROVIDE_PROJECT_ID, source_language_code=None, target_language_code, document_input_config, document_output_config, customized_attribution=None, is_translate_native_pdf_only=False, enable_shadow_removal_native_pdf=False, enable_rotation_correction=False, model=None, glossary_config=None, labels=None, timeout=DEFAULT, retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Translate document provided.
Wraps the Google cloud Translate Text (Advanced) functionality. Supports wide range of input/output file types, please visit the https://cloud.google.com/translate/docs/advanced/translate-documents for more details.
- For more information on how to use this operator, take a look at the guide:
- Parameters
project_id (str) – Optional. The ID of the Google Cloud project that the service belongs to. If not specified the hook project_id will be used.
source_language_code (str | None) – Optional. The ISO-639 language code of the input document text if known. If the source language isn’t specified, the API attempts to identify the source language automatically and returns the source language within the response.
target_language_code (str) – Required. The ISO-639 language code to use for translation of the input document text.
location (str | None) – Optional. Project or location to make a call. Must refer to a caller’s project. If not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries. Models and glossaries must be within the same region (have the same location-id).
document_input_config (google.cloud.translate_v3.types.DocumentInputConfig | dict) – A document translation request input config.
document_output_config (google.cloud.translate_v3.types.DocumentOutputConfig | dict | None) – Optional. A document translation request output config. If not provided the translated file will only be returned through a byte-stream and its output mime type will be the same as the input file’s mime type.
customized_attribution (str | None) – Optional. This flag is to support user customized attribution. If not provided, the default is
Machine Translated by Google
. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logosis_translate_native_pdf_only (bool) – Optional. Param for external customers. If true, the page limit of online native PDF translation is 300 and only native PDF pages will be translated.
enable_shadow_removal_native_pdf (bool) – Optional. If true, use the text removal server to remove the shadow text on background image for native PDF translation. Shadow removal feature can only be enabled when both
is_translate_native_pdf_only
,pdf_native_only
are False.enable_rotation_correction (bool) – Optional. If true, enable auto rotation correction in DVS.
model (str | None) –
Optional. The
model
type requested for this translation. If not provided, the default Google model (NMT) will be used. The format depends on model type:AutoML Translation models:
projects/{project-number-or-id}/locations/{location-id}/models/{model-id}
General (built-in) models:
projects/{project-number-or-id}/locations/{location-id}/models/general/nmt
If not provided, the default Google model (NMT) will be used for translation.
glossary_config (google.cloud.translate_v3.types.TranslateTextGlossaryConfig | None) – Optional. Glossary to be applied.
transliteration_config – Optional. Transliteration to be applied.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('source_language_code', 'target_language_code', 'document_input_config',...[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateDocumentBatchOperator(*, project_id=PROVIDE_PROJECT_ID, source_language_code, target_language_codes=None, location=None, input_configs, output_config, customized_attribution=None, format_conversions=None, enable_shadow_removal_native_pdf=False, enable_rotation_correction=False, models=None, glossaries=None, metadata=(), timeout=DEFAULT, retry=DEFAULT, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Translate documents provided via input and output configurations.
Up to 10 target languages per operation supported. Wraps the Google cloud Translate Text (Advanced) functionality. See https://cloud.google.com/translate/docs/advanced/batch-translation.
For more information on how to use this operator, take a look at the guide: TranslateDocumentBatchOperator.
- Parameters
project_id (str) – Required. The ID of the Google Cloud project that the service belongs to.
source_language_code (str) – Optional. The ISO-639 language code of the input text if known. If the source language isn’t specified, the API attempts to identify the source language automatically and returns the source language within the response.
target_language_codes (collections.abc.MutableSequence[str] | None) – Required. The ISO-639 language code to use for translation of the input document. Specify up to 10 language codes here.
location (str | None) – Optional. Project or location to make a call. Must refer to a caller’s project. If not specified, ‘global’ is used. Non-global location is required for requests using AutoML models or custom glossaries. Models and glossaries must be within the same region (have the same location-id).
input_configs (collections.abc.MutableSequence[google.cloud.translate_v3.types.BatchDocumentInputConfig | dict]) – Input configurations. The total number of files matched should be <= 100. The total content size to translate should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.
output_config (google.cloud.translate_v3.types.BatchDocumentOutputConfig | dict) – Output configuration. If 2 input configs match to the same file (that is, same input path), no output for duplicate inputs will be generated.
format_conversions (collections.abc.MutableMapping[str, str] | None) –
Optional. The file format conversion map that is applied to all input files. The map key is the original mime_type. The map value is the target mime_type of translated documents. Supported file format conversion includes:
application/pdf
toapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
If nothing specified, output files will be in the same format as the original file.
customized_attribution (str | None) – Optional. This flag is to support user customized attribution. If not provided, the default is
Machine Translated by Google
. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logosenable_shadow_removal_native_pdf (bool) – Optional. If true, use the text removal server to remove the shadow text on background image for native PDF translation. Shadow removal feature can only be enabled when both
is_translate_native_pdf_only
,pdf_native_only
are False.enable_rotation_correction (bool) – Optional. If true, enable auto rotation correction in DVS.
models (collections.abc.MutableMapping[str, str] | None) –
Optional. The models to use for translation. Map’s key is target language code. Map’s value is the model name. Value can be a built-in general model, or an AutoML Translation model. The value format depends on model type:
AutoML Translation models:
projects/{project-number-or-id}/locations/{location-id}/models/{model-id}
General (built-in) models:
projects/{project-number-or-id}/locations/{location-id}/models/general/nmt
,
If the map is empty or a specific model is not requested for a language pair, then default google model (NMT) is used.
glossaries (collections.abc.MutableMapping[str, google.cloud.translate_v3.types.TranslateTextGlossaryConfig] | None) – Glossaries to be applied. It’s keyed by target language code.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault | None) – Designation of what errors, if any, should be retried.
timeout (float | google.api_core.gapic_v1.method._MethodDefault) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('input_configs', 'output_config', 'target_language_codes', 'source_language_code', 'models',...[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateCreateGlossaryOperator(*, project_id=PROVIDE_PROJECT_ID, location, glossary_id, input_config, language_pair=None, language_codes_set=None, timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', metadata=(), impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Creates a Google Cloud Translation Glossary.
Creates a translation glossary, using API V3. For more information on how to use this operator, take a look at the guide: TranslateCreateGlossaryOperator.
- Parameters
glossary_id (str) – User-specified id to built glossary resource name.
input_config (google.cloud.translate_v3.types.translation_service.GlossaryInputConfig | dict) – The input configuration of examples to built glossary from. Total glossary must not exceed 10M Unicode codepoints. The headers should not be included into the input file table, as languages specified with the
language_pair
orlanguage_codes_set
params.language_pair (google.cloud.translate_v3.types.translation_service.Glossary.LanguageCodePair | dict | None) – Pair of language codes to be used for glossary creation. Used to built unidirectional glossary. If specified, the
language_codes_set
should be empty.language_codes_set (google.cloud.translate_v3.types.translation_service.Glossary.LanguageCodesSet | collections.abc.MutableSequence[str] | None) – Set of language codes to create the equivalent term sets glossary. Meant multiple languages mapping. If specified, the
language_pair
should be empty.project_id (str) – ID of the Google Cloud project where glossary is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('glossary_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateUpdateGlossaryOperator(*, project_id=PROVIDE_PROJECT_ID, location, glossary_id, new_display_name, new_input_config=None, timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', metadata=(), impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Update glossary item with values provided.
Updates the translation glossary, using translation API V3. Only
display_name
andinput_config
fields are allowed for update.For more information on how to use this operator, take a look at the guide: TranslateUpdateGlossaryOperator.
- Parameters
glossary_id (str) – User-specified id to built glossary resource name.
input_config – The input configuration of examples to built glossary from. Total glossary must not exceed 10M Unicode codepoints. The headers should not be included into the input file table, as languages specified with the
language_pair
orlanguage_codes_set
params.language_pair – Pair of language codes to be used for glossary creation. Used to built unidirectional glossary. If specified, the
language_codes_set
should be empty.language_codes_set – Set of language codes to create the equivalent term sets glossary. Meant multiple languages mapping. If specified, the
language_pair
should be empty.project_id (str) – ID of the Google Cloud project where glossary is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('glossary_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateListGlossariesOperator(*, project_id=PROVIDE_PROJECT_ID, location, page_size=None, page_token=None, filter_str=None, timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', metadata=(), impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Get a list of translation glossaries in a project.
List the translation glossaries, using translation API V3.
For more information on how to use this operator, take a look at the guide: TranslateListGlossariesOperator.
- Parameters
project_id (str) – ID of the Google Cloud project where glossary is located. If not provided default project_id is used.
page_size (int | None) – Page size requested, if not set server use appropriate default.
page_token (str | None) – A token identifying a page of results the server should return. The first page is returned if
page_token
is empty or missing.filter_str (str | None) – Filter specifying constraints of a list operation. Specify the constraint by the format of “key=value”, where key must be
src
ortgt
, and the value must be a valid language code. For multiple restrictions, concatenate them by “AND” (uppercase only), such as:src=en-US AND tgt=zh-CN
. Notice that the exact match is used here, which means using ‘en-US’ and ‘en’ can lead to different results, which depends on the language code you used when you create the glossary. For the unidirectional glossaries, thesrc
andtgt
add restrictions on the source and target language code separately. For the equivalent term set glossaries, thesrc
and/ortgt
add restrictions on the term set. For example:src=en-US AND tgt=zh-CN
will only pick the unidirectional glossaries which exactly match the source language code asen-US
and the target language codezh-CN
, but all equivalent term set glossaries which containen-US
andzh-CN
in their language set will be picked. If missing, no filtering is performed.location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.operators.translate.TranslateDeleteGlossaryOperator(*, project_id=PROVIDE_PROJECT_ID, location, glossary_id, timeout=None, retry=DEFAULT, gcp_conn_id='google_cloud_default', metadata=(), impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.google.cloud.operators.cloud_base.GoogleCloudBaseOperator
Delete a Google Cloud Translation Glossary.
Deletes a translation glossary, using API V3. For more information on how to use this operator, take a look at the guide: TranslateDeleteGlossaryOperator.
- Parameters
glossary_id (str) – User-specified id to delete glossary resource item.
project_id (str) – ID of the Google Cloud project where glossary is located. If not provided default project_id is used.
location (str) – The location of the project.
retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – Designation of what errors, if any, should be retried.
timeout (float | None) – The timeout for this request.
metadata (collections.abc.Sequence[tuple[str, str]]) – Strings which should be sent along with the request as metadata.
gcp_conn_id (str) – The connection ID to use connecting to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('glossary_id', 'location', 'project_id', 'gcp_conn_id', 'impersonation_chain')[source]¶