SQL Data Frames Integration

The DbApiHook provides built-in integration with popular data analysis frameworks, allowing you to directly query databases and retrieve results as either Pandas or Polars dataframes. This integration simplifies data workflows by eliminating the need for manual conversion between SQL query results and data frames.

Pandas Integration

Pandas is a widely used data analysis and manipulation library. The SQL hook allows you to directly retrieve query results as Pandas DataFrames, which is particularly useful for further data transformation, analysis, or visualization within your Airflow tasks.

# Get complete DataFrame in a single operation
df = hook.get_df(
    sql="SELECT * FROM my_table WHERE date_column >= %s", parameters=["2023-01-01"], df_type="pandas"
)

# Get DataFrame in chunks for memory-efficient processing of large results
for chunk_df in hook.get_df_by_chunks(sql="SELECT * FROM large_table", chunksize=10000, df_type="pandas"):
    process_chunk(chunk_df)

To use this feature, install the pandas extra when installing this provider package. For installation instructions, see <index>.

Polars Integration

Polars is a modern, high-performance DataFrame library implemented in Rust with Python bindings. It’s designed for speed and efficiency when working with large datasets. The SQL hook supports retrieving data directly as Polars DataFrames, which can be particularly beneficial for performance-critical data processing tasks.

# Get complete DataFrame in a single operation
df = hook.get_df(
    sql="SELECT * FROM my_table WHERE date_column >= %s",
    parameters={"date_column": "2023-01-01"},
    df_type="polars",
)

# Get DataFrame in chunks for memory-efficient processing of large results
for chunk_df in hook.get_df_by_chunks(sql="SELECT * FROM large_table", chunksize=10000, df_type="polars"):
    process_chunk(chunk_df)

To use this feature, install the polars extra when installing this provider package. For installation instructions, see <index>.

Was this entry helpful?