airflow.providers.pinecone.operators.pinecone

Module Contents

Classes

PineconeIngestOperator

Ingest vector embeddings into Pinecone.

CreatePodIndexOperator

Create a pod based index in Pinecone.

CreateServerlessIndexOperator

Create a serverless index in Pinecone.

class airflow.providers.pinecone.operators.pinecone.PineconeIngestOperator(*, conn_id=PineconeHook.default_conn_name, index_name, input_vectors, namespace='', batch_size=None, upsert_kwargs=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Ingest vector embeddings into Pinecone.

See also

For more information on how to use this operator, take a look at the guide: Ingest data into a pinecone index

Parameters
  • conn_id (str) – The connection id to use when connecting to Pinecone.

  • index_name (str) – Name of the Pinecone index.

  • input_vectors (list[tuple]) – Data to be ingested, in the form of a list of tuples where each tuple contains (id, vector_embedding, metadata).

  • namespace (str) – The namespace to write to. If not specified, the default namespace is used.

  • batch_size (int | None) – The number of vectors to upsert in each batch.

  • upsert_kwargs (dict | None) –

template_fields: Sequence[str] = ('index_name', 'input_vectors', 'namespace')[source]
hook()[source]

Return an instance of the PineconeHook.

execute(context)[source]

Ingest data into Pinecone using the PineconeHook.

class airflow.providers.pinecone.operators.pinecone.CreatePodIndexOperator(*, conn_id=PineconeHook.default_conn_name, index_name, dimension, environment=None, replicas=None, shards=None, pods=None, pod_type='p1.x1', metadata_config=None, source_collection=None, metric='cosine', timeout=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create a pod based index in Pinecone.

See also

For more information on how to use this operator, take a look at the guide: Create a Pod based Index

Parameters
  • conn_id (str) – The connection id to use when connecting to Pinecone.

  • index_name (str) – Name of the Pinecone index.

  • dimension (int) – The dimension of the vectors to be indexed.

  • environment (str | None) – The environment to use when creating the index.

  • replicas (int | None) – The number of replicas to use.

  • shards (int | None) – The number of shards to use.

  • pods (int | None) – The number of pods to use.

  • pod_type (str) – The type of pod to use. Defaults to p1.x1

  • metadata_config (dict | None) – The metadata configuration to use.

  • source_collection (str | None) – The source collection to use.

  • metric (str) – The metric to use. Defaults to cosine.

  • timeout (int | None) – The timeout to use.

hook()[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.pinecone.operators.pinecone.CreateServerlessIndexOperator(*, conn_id=PineconeHook.default_conn_name, index_name, dimension, cloud, region=None, metric=None, timeout=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Create a serverless index in Pinecone.

See also

For more information on how to use this operator, take a look at the guide: Create a Serverless Index

Parameters
  • conn_id (str) – The connection id to use when connecting to Pinecone.

  • index_name (str) – Name of the Pinecone index.

  • dimension (int) – The dimension of the vectors to be indexed.

  • cloud (str) – The cloud to use when creating the index.

  • region (str | None) – The region to use when creating the index.

  • metric (str | None) – The metric to use.

  • timeout (int | None) – The timeout to use.

hook()[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?