Amazon SageMaker Unified Studio¶
Amazon SageMaker Unified Studio is a unified development experience that brings together AWS data, analytics, artificial intelligence (AI), and machine learning (ML) services. It provides a place to build, deploy, execute, and monitor end-to-end workflows from a single interface. This helps drive collaboration across teams and facilitate agile development.
Airflow provides operators to orchestrate Notebooks, Querybooks, and Visual ETL jobs within SageMaker Unified Studio Workflows.
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create a SageMaker Unified Studio domain and project, following the instruction in AWS documentation.
Within your project: * Navigate to the “Compute > Workflow environments” tab, and click “Create” to create a new MWAA environment. * Create a Notebook, Querybook, or Visual ETL job and save it to your project.
Operators¶
Create an Amazon SageMaker Unified Studio Workflow¶
To create an Amazon SageMaker Unified Studio workflow to orchestrate your notebook, querybook, and visual ETL runs you can use
SageMakerNotebookOperator.
# Run notebook using the legacy env-var-based resolution path (MWAA-style).
run_notebook = SageMakerNotebookOperator(
task_id="run-notebook",
input_config={"input_path": notebook_path, "input_params": {}},
output_config={"output_formats": ["NOTEBOOK"]}, # optional
compute={
"instance_type": "ml.m5.large",
"volume_size_in_gb": 30,
}, # optional
termination_condition={"max_runtime_in_seconds": 600}, # optional
tags={}, # optional
wait_for_completion=True, # optional
waiter_delay=5, # optional
deferrable=False, # optional
executor_config={ # optional
"overrides": {
"containerOverrides": [
{
"environment": [
{"name": key, "value": value}
for key, value in mock_mwaa_environment_params.items()
],
"name": "ECSExecutorContainer", # Necessary parameter
}
]
}
},
)
The following example adds domain ID, project ID, and domain name as operator parameters.
# Run notebook with domain_id/project_id/domain_region passed explicitly as operator parameters.
# No environment variables needed — the SDK resolves the S3 path and region from these params.
# Requires sagemaker-studio>=1.0.25.
# NOTE: this task runs BEFORE env vars are set intentionally, to prove that explicit params
# work without any MWAA-style environment variables present.
run_notebook_explicit_params = SageMakerNotebookOperator(
task_id="run-notebook-explicit",
domain_id=domain_id,
project_id=project_id,
domain_region=region_name,
input_config={"input_path": notebook_path, "input_params": {}},
output_config={"output_formats": ["NOTEBOOK"]}, # optional
compute={
"instance_type": "ml.m5.large",
"volume_size_in_gb": 30,
}, # optional
termination_condition={"max_runtime_in_seconds": 600}, # optional
tags={}, # optional
wait_for_completion=True, # optional
waiter_delay=5, # optional
deferrable=False, # optional
)