airflow.providers.pinecone.hooks.pinecone

Hook for Pinecone.

Module Contents

Classes

PineconeHook

Interact with Pinecone. This hook uses the Pinecone conn_id.

class airflow.providers.pinecone.hooks.pinecone.PineconeHook(conn_id=default_conn_name, environment=None, region=None)[source]

Bases: airflow.hooks.base.BaseHook

Interact with Pinecone. This hook uses the Pinecone conn_id.

Parameters

conn_id (str) – Optional, default connection id is pinecone_default. The connection id to use when connecting to Pinecone.

property api_key: str[source]
conn_name_attr = 'conn_id'[source]
default_conn_name = 'pinecone_default'[source]
conn_type = 'pinecone'[source]
hook_name = 'Pinecone'[source]
classmethod get_connection_form_widgets()[source]

Return connection widgets to add to connection form.

classmethod get_ui_field_behaviour()[source]

Return custom field behaviour.

environment()[source]
region()[source]
pinecone_client()[source]

Pinecone object to interact with Pinecone.

conn()[source]
test_connection()[source]
list_indexes()[source]

Retrieve a list of all indexes in your project.

upsert(index_name, vectors, namespace='', batch_size=None, show_progress=True, **kwargs)[source]

Write vectors into a namespace.

If a new value is upserted for an existing vector id, it will overwrite the previous value.

To upsert in parallel follow

Parameters
  • index_name (str) – The name of the index to describe.

  • vectors (list[Any]) – A list of vectors to upsert.

  • namespace (str) – The namespace to write to. If not specified, the default namespace - “” is used.

  • batch_size (int | None) – The number of vectors to upsert in each batch.

  • show_progress (bool) – Whether to show a progress bar using tqdm. Applied only if batch_size is provided.

get_pod_spec_obj(*, replicas=None, shards=None, pods=None, pod_type='p1.x1', metadata_config=None, source_collection=None, environment=None)[source]

Get a PodSpec object.

Parameters
  • replicas (int | None) – The number of replicas.

  • shards (int | None) – The number of shards.

  • pods (int | None) – The number of pods.

  • pod_type (str | None) – The type of pod.

  • metadata_config (dict | None) – The metadata configuration.

  • source_collection (str | None) – The source collection.

  • environment (str | None) – The environment to use when creating the index.

get_serverless_spec_obj(*, cloud, region=None)[source]

Get a ServerlessSpec object.

Parameters
  • cloud (str) – The cloud provider.

  • region (str | None) – The region to use when creating the index.

create_index(index_name, dimension, spec, metric='cosine', timeout=None)[source]

Create a new index.

Parameters
  • index_name (str) – The name of the index.

  • dimension (int) – The dimension of the vectors to be indexed.

  • spec (pinecone.ServerlessSpec | pinecone.PodSpec) – Pass a ServerlessSpec object to create a serverless index or a PodSpec object to create a pod index. get_serverless_spec_obj and get_pod_spec_obj can be used to create the Spec objects.

  • metric (str | None) – The metric to use. Defaults to cosine.

  • timeout (int | None) – The timeout to use.

describe_index(index_name)[source]

Retrieve information about a specific index.

Parameters

index_name (str) – The name of the index to describe.

delete_index(index_name, timeout=None)[source]

Delete a specific index.

Parameters
  • index_name (str) – the name of the index.

  • timeout (int | None) – Timeout for wait until index gets ready.

configure_index(index_name, replicas=None, pod_type='')[source]

Change the current configuration of the index.

Parameters
  • index_name (str) – The name of the index to configure.

  • replicas (int | None) – The new number of replicas.

  • pod_type (str | None) – the new pod_type for the index.

create_collection(collection_name, index_name)[source]

Create a new collection from a specified index.

Parameters
  • collection_name (str) – The name of the collection to create.

  • index_name (str) – The name of the source index.

delete_collection(collection_name)[source]

Delete a specific collection.

Parameters

collection_name (str) – The name of the collection to delete.

describe_collection(collection_name)[source]

Retrieve information about a specific collection.

Parameters

collection_name (str) – The name of the collection to describe.

list_collections()[source]

Retrieve a list of all collections in the current project.

query_vector(index_name, vector, query_id=None, top_k=10, namespace=None, query_filter=None, include_values=None, include_metadata=None, sparse_vector=None)[source]

Search a namespace using query vector.

It retrieves the ids of the most similar items in a namespace, along with their similarity scores. API reference: https://docs.pinecone.io/reference/query

Parameters
  • index_name (str) – The name of the index to query.

  • vector (list[Any]) – The query vector.

  • query_id (str | None) – The unique ID of the vector to be used as a query vector.

  • top_k (int) – The number of results to return.

  • namespace (str | None) – The namespace to fetch vectors from. If not specified, the default namespace is used.

  • query_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – The filter to apply. See https://www.pinecone.io/docs/metadata-filtering/

  • include_values (bool | None) – Whether to include the vector values in the result.

  • include_metadata (bool | None) – Indicates whether metadata is included in the response as well as the ids.

  • sparse_vector (pinecone.core.client.model.sparse_values.SparseValues | dict[str, list[float] | list[int]] | None) – sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form: {‘indices’: List[int], ‘values’: List[float]}, where the lists each have the same length.

upsert_data_async(index_name, data, async_req=False, pool_threads=None)[source]

Upserts (insert/update) data into the Pinecone index.

Parameters
  • index_name (str) – Name of the index.

  • data (list[tuple[Any]]) – List of tuples to be upserted. Each tuple is of form (id, vector, metadata). Metadata is optional.

  • async_req (bool) – If True, upsert operations will be asynchronous.

  • pool_threads (int | None) – Number of threads for parallel upserting. If async_req is True, this must be provided.

describe_index_stats(index_name, stats_filter=None, **kwargs)[source]

Describe the index statistics.

Returns statistics about the index’s contents. For example: The vector count per namespace and the number of dimensions. API reference: https://docs.pinecone.io/reference/describe_index_stats_post

Parameters

Was this entry helpful?