Airflow Summit 2025 is coming October 07-09. Register now for early bird ticket!

airflow.providers.google.cloud.hooks.dataprep

This module contains Google Dataprep hook.

Classes

JobGroupStatuses

Types of job group run statuses.

GoogleDataprepHook

Hook for connection with Dataprep API.

Module Contents

class airflow.providers.google.cloud.hooks.dataprep.JobGroupStatuses[source]

Bases: str, enum.Enum

Types of job group run statuses.

CREATED = 'Created'[source]
UNDEFINED = 'undefined'[source]
IN_PROGRESS = 'InProgress'[source]
COMPLETE = 'Complete'[source]
FAILED = 'Failed'[source]
CANCELED = 'Canceled'[source]
class airflow.providers.google.cloud.hooks.dataprep.GoogleDataprepHook(dataprep_conn_id=default_conn_name, api_version='v4', **kwargs)[source]

Bases: airflow.hooks.base.BaseHook

Hook for connection with Dataprep API.

To get connection Dataprep with Airflow you need Dataprep token.

https://clouddataprep.com/documentation/api#section/Authentication

It should be added to the Connection in Airflow in JSON format.

conn_name_attr = 'dataprep_conn_id'[source]
default_conn_name = 'google_cloud_dataprep_default'[source]
conn_type = 'dataprep'[source]
hook_name = 'Google Dataprep'[source]
dataprep_conn_id = 'google_cloud_dataprep_default'[source]
api_version = 'v4'[source]
get_jobs_for_job_group(job_id)[source]

Get information about the batch jobs within a Cloud Dataprep job.

Parameters:

job_id (int) – The ID of the job that will be fetched

get_job_group(job_group_id, embed, include_deleted)[source]

Get the specified job group.

A job group is a job that is executed from a specific node in a flow.

Parameters:
  • job_group_id (int) – The ID of the job that will be fetched

  • embed (str) – Comma-separated list of objects to pull in as part of the response

  • include_deleted (bool) – if set to “true”, will include deleted objects

run_job_group(body_request)[source]

Create a jobGroup, which launches the specified job as the authenticated user.

This performs the same action as clicking on the Run Job button in the application.

To get recipe_id please follow the Dataprep API documentation https://clouddataprep.com/documentation/api#operation/runJobGroup.

Parameters:

body_request (dict) – The identifier for the recipe you would like to run.

create_flow(*, body_request)[source]

Create flow.

Parameters:

body_request (dict) – Body of the POST request to be sent. For more details check https://clouddataprep.com/documentation/api#operation/createFlow

copy_flow(*, flow_id, name='', description='', copy_datasources=False)[source]

Create a copy of the provided flow id, as well as all contained recipes.

Parameters:
  • flow_id (int) – ID of the flow to be copied

  • name (str) – Name for the copy of the flow

  • description (str) – Description of the copy of the flow

  • copy_datasources (bool) – Bool value to define should copies of data inputs be made or not.

delete_flow(*, flow_id)[source]

Delete the flow with the provided id.

Parameters:

flow_id (int) – ID of the flow to be copied

run_flow(*, flow_id, body_request)[source]

Run the flow with the provided id copy of the provided flow id.

Parameters:
  • flow_id (int) – ID of the flow to be copied

  • body_request (dict) – Body of the POST request to be sent.

get_job_group_status(*, job_group_id)[source]

Check the status of the Dataprep task to be finished.

Parameters:

job_group_id (int) – ID of the job group to check

create_imported_dataset(*, body_request)[source]

Create imported dataset.

Parameters:

body_request (dict) – Body of the POST request to be sent. For more details check https://clouddataprep.com/documentation/api#operation/createImportedDataset

create_wrangled_dataset(*, body_request)[source]

Create wrangled dataset.

Parameters:

body_request (dict) – Body of the POST request to be sent. For more details check https://clouddataprep.com/documentation/api#operation/createWrangledDataset

create_output_object(*, body_request)[source]

Create output.

Parameters:

body_request (dict) – Body of the POST request to be sent. For more details check https://clouddataprep.com/documentation/api#operation/createOutputObject

create_write_settings(*, body_request)[source]

Create write settings.

Parameters:

body_request (dict) – Body of the POST request to be sent. For more details check https://clouddataprep.com/documentation/api#tag/createWriteSetting

delete_imported_dataset(*, dataset_id)[source]

Delete imported dataset.

Parameters:

dataset_id (int) – ID of the imported dataset for removal.

Was this entry helpful?