airflow.providers.amazon.aws.operators.bedrock¶

Classes¶

`BedrockInvokeModelOperator`	Invoke the specified Bedrock model to run inference using the input provided.
`BedrockCustomizeModelOperator`	Create a fine-tuning job to customize a base model.
`BedrockCreateProvisionedModelThroughputOperator`	Create a fine-tuning job to customize a base model.
`BedrockCreateKnowledgeBaseOperator`	Create a knowledge base that contains data sources used by Amazon Bedrock LLMs and Agents.
`BedrockCreateDataSourceOperator`	Set up an Amazon Bedrock Data Source to be added to an Amazon Bedrock Knowledge Base.
`BedrockIngestDataOperator`	Begin an ingestion job, in which an Amazon Bedrock data source is added to an Amazon Bedrock knowledge base.
`BedrockRaGOperator`	Query a knowledge base and generate responses based on the retrieved results with sources citations.
`BedrockRetrieveOperator`	Query a knowledge base and retrieve results with source citations.
`BedrockBatchInferenceOperator`	Create a batch inference job to invoke a model on multiple prompts.

Module Contents¶

class airflow.providers.amazon.aws.operators.bedrock.BedrockInvokeModelOperator(model_id, input_data, content_type=None, accept_type=None, **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockRuntimeHook]

Invoke the specified Bedrock model to run inference using the input provided.

Use InvokeModel to run inference for text models, image models, and embedding models. To see the format and content of the input_data field for different models, refer to Inference parameters docs.

See also

For more information on how to use this operator, take a look at the guide: Invoke an existing Amazon Bedrock Model

Parameters:

model_id (str) – The ID of the Bedrock model. (templated)
input_data (dict[str, Any]) – Input data in the format specified in the content-type request header. (templated)
content_type (str | None) – The MIME type of the input data in the request. (templated) Default: application/json
accept – The desired MIME type of the inference body in the response. (templated) Default: application/json
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

model_id[source]¶

input_data[source]¶

content_type = None[source]¶

accept_type = None[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockCustomizeModelOperator(job_name, custom_model_name, role_arn, base_model_id, training_data_uri, output_data_uri, hyperparameters, ensure_unique_job_name=True, customization_job_kwargs=None, wait_for_completion=True, waiter_delay=120, waiter_max_attempts=75, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockHook]

Create a fine-tuning job to customize a base model.

See also

For more information on how to use this operator, take a look at the guide: Customize an existing Amazon Bedrock Model

Parameters:

job_name (str) – A unique name for the fine-tuning job.
custom_model_name (str) – A name for the custom model being created.
role_arn (str) – The Amazon Resource Name (ARN) of an IAM role that Amazon Bedrock can assume to perform tasks on your behalf.
base_model_id (str) – Name of the base model.
training_data_uri (str) – The S3 URI where the training data is stored.
output_data_uri (str) – The S3 URI where the output data is stored.
hyperparameters (dict[str, str]) – Parameters related to tuning the model.
ensure_unique_job_name (bool) – If set to true, operator will check whether a model customization job already exists for the name in the config and append the current timestamp if there is a name conflict. (Default: True)
customization_job_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the API.
wait_for_completion (bool) – Whether to wait for cluster to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 120)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 75)
deferrable (bool) – If True, the operator will wait asynchronously for the cluster to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

wait_for_completion = True[source]¶

waiter_delay = 120[source]¶

waiter_max_attempts = 75[source]¶

deferrable = True[source]¶

job_name[source]¶

custom_model_name[source]¶

role_arn[source]¶

base_model_id[source]¶

training_data_config[source]¶

output_data_config[source]¶

hyperparameters[source]¶

ensure_unique_job_name = True[source]¶

customization_job_kwargs[source]¶

valid_action_if_job_exists: set[str][source]¶

execute_complete(context, event=None)[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockCreateProvisionedModelThroughputOperator(model_units, provisioned_model_name, model_id, create_throughput_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockHook]

Create a fine-tuning job to customize a base model.

See also

For more information on how to use this operator, take a look at the guide: Provision Throughput for an existing Amazon Bedrock Model

Parameters:

model_units (int) – Number of model units to allocate. (templated)
provisioned_model_name (str) – Unique name for this provisioned throughput. (templated)
model_id (str) – Name or ARN of the model to associate with this provisioned throughput. (templated)
create_throughput_kwargs (dict[str, Any] | None) – Any optional parameters to pass to the API.
wait_for_completion (bool) – Whether to wait for cluster to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20)
deferrable (bool) – If True, the operator will wait asynchronously for the cluster to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

model_units[source]¶

provisioned_model_name[source]¶

model_id[source]¶

create_throughput_kwargs[source]¶

wait_for_completion = True[source]¶

waiter_delay = 60[source]¶

waiter_max_attempts = 20[source]¶

deferrable = True[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]¶

class airflow.providers.amazon.aws.operators.bedrock.BedrockCreateKnowledgeBaseOperator(name, embedding_model_arn, role_arn, storage_config, create_knowledge_base_kwargs=None, wait_for_indexing=True, indexing_error_retry_delay=5, indexing_error_max_attempts=20, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentHook]

Create a knowledge base that contains data sources used by Amazon Bedrock LLMs and Agents.

To create a knowledge base, you must first set up your data sources and configure a supported vector store.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon Bedrock Knowledge Base

Parameters:

name (str) – The name of the knowledge base. (templated)
embedding_model_arn (str) – ARN of the model used to create vector embeddings for the knowledge base. (templated)
role_arn (str) – The ARN of the IAM role with permissions to create the knowledge base. (templated)
storage_config (dict[str, Any]) – Configuration details of the vector database used for the knowledge base. (templated)
wait_for_indexing (bool) – Vector indexing can take some time and there is no apparent way to check the state before trying to create the Knowledge Base. If this is True, and creation fails due to the index not being available, the operator will wait and retry. (default: True) (templated)
indexing_error_retry_delay (int) – Seconds between retries if an index error is encountered. (default 5) (templated)
indexing_error_max_attempts (int) – Maximum number of times to retry when encountering an index error. (default 20) (templated)
create_knowledge_base_kwargs (dict[str, Any] | None) – Any additional optional parameters to pass to the API call. (templated)
wait_for_completion (bool) – Whether to wait for cluster to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 20)
deferrable (bool) – If True, the operator will wait asynchronously for the cluster to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

name[source]¶

role_arn[source]¶

storage_config[source]¶

create_knowledge_base_kwargs[source]¶

embedding_model_arn[source]¶

knowledge_base_config[source]¶

wait_for_indexing = True[source]¶

indexing_error_retry_delay = 5[source]¶

indexing_error_max_attempts = 20[source]¶

wait_for_completion = True[source]¶

waiter_delay = 60[source]¶

waiter_max_attempts = 20[source]¶

deferrable = True[source]¶

execute_complete(context, event=None)[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockCreateDataSourceOperator(name, knowledge_base_id, bucket_name=None, create_data_source_kwargs=None, **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentHook]

Set up an Amazon Bedrock Data Source to be added to an Amazon Bedrock Knowledge Base.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon Bedrock Data Source

Parameters:

name (str) – name for the Amazon Bedrock Data Source being created. (templated).
bucket_name (str | None) – The name of the Amazon S3 bucket to use for data source storage. (templated)
knowledge_base_id (str) – The unique identifier of the knowledge base to which to add the data source. (templated)
create_data_source_kwargs (dict[str, Any] | None) – Any additional optional parameters to pass to the API call. (templated)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

name[source]¶

knowledge_base_id[source]¶

bucket_name = None[source]¶

create_data_source_kwargs[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockIngestDataOperator(knowledge_base_id, data_source_id, ingest_data_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=10, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentHook]

Begin an ingestion job, in which an Amazon Bedrock data source is added to an Amazon Bedrock knowledge base.

See also

For more information on how to use this operator, take a look at the guide: Ingest data into an Amazon Bedrock Data Source

Parameters:

knowledge_base_id (str) – The unique identifier of the knowledge base to which to add the data source. (templated)
data_source_id (str) – The unique identifier of the data source to ingest. (templated)
ingest_data_kwargs (dict[str, Any] | None) – Any additional optional parameters to pass to the API call. (templated)
wait_for_completion (bool) – Whether to wait for cluster to stop. (default: True)
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 10)
deferrable (bool) – If True, the operator will wait asynchronously for the cluster to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

knowledge_base_id[source]¶

data_source_id[source]¶

ingest_data_kwargs[source]¶

wait_for_completion = True[source]¶

waiter_delay = 60[source]¶

waiter_max_attempts = 10[source]¶

deferrable = True[source]¶

execute_complete(context, event=None)[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockRaGOperator(input, source_type, model_arn, prompt_template=None, knowledge_base_id=None, vector_search_config=None, sources=None, rag_kwargs=None, **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentRuntimeHook]

Query a knowledge base and generate responses based on the retrieved results with sources citations.

NOTE: Support for EXTERNAL SOURCES was added in botocore 1.34.90

See also

For more information on how to use this operator, take a look at the guide: Amazon Bedrock Retrieve and Generate (RaG)

Parameters:

input (str) – The query to be made to the knowledge base. (templated)
source_type (str) – The type of resource that is queried by the request. (templated) Must be one of ‘KNOWLEDGE_BASE’ or ‘EXTERNAL_SOURCES’, and the appropriate config values must also be provided. If set to ‘KNOWLEDGE_BASE’ then knowledge_base_id must be provided, and vector_search_config may be. If set to EXTERNAL_SOURCES then sources must also be provided. NOTE: Support for EXTERNAL SOURCES was added in botocore 1.34.90
model_arn (str) – The ARN of the foundation model used to generate a response. (templated)
prompt_template (str | None) – The template for the prompt that’s sent to the model for response generation. You can include prompt placeholders, which are replaced before the prompt is sent to the model to provide instructions and context to the model. In addition, you can include XML tags to delineate meaningful sections of the prompt template. (templated)
knowledge_base_id (str | None) – The unique identifier of the knowledge base that is queried. (templated) Can only be specified if source_type=’KNOWLEDGE_BASE’.
vector_search_config (dict[str, Any] | None) – How the results from the vector search should be returned. (templated) Can only be specified if source_type=’KNOWLEDGE_BASE’. For more information, see https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html.
sources (list[dict[str, Any]] | None) – The documents used as reference for the response. (templated) Can only be specified if source_type=’EXTERNAL_SOURCES’ NOTE: Support for EXTERNAL SOURCES was added in botocore 1.34.90
rag_kwargs (dict[str, Any] | None) – Additional keyword arguments to pass to the API call. (templated)

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

input[source]¶

prompt_template = None[source]¶

source_type[source]¶

knowledge_base_id = None[source]¶

model_arn[source]¶

vector_search_config = None[source]¶

sources = None[source]¶

rag_kwargs[source]¶

validate_inputs()[source]¶

build_rag_config()[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockRetrieveOperator(retrieval_query, knowledge_base_id, vector_search_config=None, retrieve_kwargs=None, **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockAgentRuntimeHook]

Query a knowledge base and retrieve results with source citations.

See also

For more information on how to use this operator, take a look at the guide: Amazon Bedrock Retrieve

Parameters:

retrieval_query (str) – The query to be made to the knowledge base. (templated)
knowledge_base_id (str) – The unique identifier of the knowledge base that is queried. (templated)
vector_search_config (dict[str, Any] | None) – How the results from the vector search should be returned. (templated) For more information, see https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html.
retrieve_kwargs (dict[str, Any] | None) – Additional keyword arguments to pass to the API call. (templated)

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

retrieval_query[source]¶

knowledge_base_id[source]¶

vector_search_config = None[source]¶

retrieve_kwargs[source]¶

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.amazon.aws.operators.bedrock.BedrockBatchInferenceOperator(job_name, role_arn, model_id, input_uri, output_uri, invoke_kwargs=None, wait_for_completion=True, waiter_delay=60, waiter_max_attempts=20, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.bedrock.BedrockHook]

Create a batch inference job to invoke a model on multiple prompts.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon Bedrock Batch Inference Job

Parameters:

job_name (str) – A name to give the batch inference job. (templated)
role_arn (str) – The ARN of the IAM role with permissions to create the knowledge base. (templated)
model_id (str) – Name or ARN of the model to associate with this provisioned throughput. (templated)
input_uri (str) – The S3 location of the input data. (templated)
output_uri (str) – The S3 location of the output data. (templated)
invoke_kwargs (dict[str, Any] | None) – Additional keyword arguments to pass to the API call. (templated)
wait_for_completion (bool) – Whether to wait for cluster to stop. (default: True) NOTE: The way batch inference jobs work, your jobs are added to a queue and done “eventually” so using deferrable mode is much more practical than using wait_for_completion.
waiter_delay (int) – Time in seconds to wait between status checks. (default: 60)
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 10)
deferrable (bool) – If True, the operator will wait asynchronously for the cluster to stop. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html