airflow.providers.apache.hive.transfers.mysql_to_hive

This module contains an operator to move data from MySQL to Hive.

Module Contents

Classes

MySqlToHiveOperator

Moves data from MySql to Hive.

class airflow.providers.apache.hive.transfers.mysql_to_hive.MySqlToHiveOperator(*, sql, hive_table, create=True, recreate=False, partition=None, delimiter=chr(1), quoting=None, quotechar='"', escapechar=None, mysql_conn_id='mysql_default', hive_cli_conn_id='hive_cli_default', hive_auth=None, tblproperties=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Moves data from MySql to Hive.

The operator runs your query against MySQL, stores the file locally before loading it into a Hive table.

If the create or recreate arguments are set to True, a CREATE TABLE and DROP TABLE statements are generated. Hive data types are inferred from the cursor’s metadata. Note that the table generated in Hive uses STORED AS textfile which isn’t the most efficient serialization format. If a large amount of data is loaded and/or if the table gets queried considerably, you may want to use this operator only to stage the data into a temporary table before loading it into its final destination using a HiveOperator.

Parameters
  • sql (str) – SQL query to execute against the MySQL database. (templated)

  • hive_table (str) – target Hive table, use dot notation to target a specific database. (templated)

  • create (bool) – whether to create the table if it doesn’t exist

  • recreate (bool) – whether to drop and recreate the table at every execution

  • partition (dict | None) – target partition as a dict of partition columns and values. (templated)

  • delimiter (str) – field delimiter in the file

  • quoting (int | None) – controls when quotes should be generated by csv writer, It can take on any of the csv.QUOTE_* constants.

  • quotechar (str) – one-character string used to quote fields containing special characters.

  • escapechar (str | None) – one-character string used by csv writer to escape the delimiter or quotechar.

  • mysql_conn_id (str) – source mysql connection

  • hive_cli_conn_id (str) – Reference to the Hive CLI connection id.

  • hive_auth (str | None) – optional authentication option passed for the Hive connection

  • tblproperties (dict | None) – TBLPROPERTIES of the hive table being created

template_fields: collections.abc.Sequence[str] = ('sql', 'partition', 'hive_table')[source]
template_ext: collections.abc.Sequence[str] = ('.sql',)[source]
template_fields_renderers[source]
ui_color = '#a0e08c'[source]
classmethod type_map(mysql_type)[source]

Map MySQL type to Hive type.

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?