Skip to main content

UpsertPinecone

Description

Publishes vectors, including metadata, and optionally text, to a Pinecone index.

Tags

chatbot, embeddings, gen ai, genai, generative ai, llm, metadata, pinecone, publish, text, upsert, vector

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Pinecone API Key *Pinecone API KeyThe API key for the Pinecone service
Pinecone Index *Pinecone IndexThe name of the Pinecone index to use

Supports Expression Language, using FlowFile attributes and Environment variables.
Pinecone Namespace *Pinecone NamespacedefaultThe name of the Pinecone namespace to use

Supports Expression Language, using FlowFile attributes and Environment variables.
Record Reader *Record ReaderController Service:
RecordReaderFactory

Implementations:
AvroReader
CEFReader
CSVReader
ExcelReader
GrokReader
JsonPathReader
JsonTreeReader
ReaderLookup
ScriptedReader
Syslog5424Reader
SyslogReader
WindowsEventLogReader
XMLReader
YamlTreeReader
The Record Reader to use for reading the FlowFile
Vector Record Path *Vector Record Path/embeddingsThe path to the vector field in the record

Supports Expression Language, using FlowFile attributes and Environment variables.
Text Record PathText Record PathThe path to the field in the record that contains the text associated with the vectors. If specified, the text will be inserted into the metadata when publishing to Pinecone. If not specified, the text will not be sent to Pinecone.

Supports Expression Language, using FlowFile attributes and Environment variables.
Text Field Name *Text Field NametextThe name of the field in the metadata to use for storing the text associated with the vectors.

This property is only considered if:
  • the property Text Record Path has a value specified
Metadata Record PathMetadata Record PathThe path to the metadata field in the record

Supports Expression Language, using FlowFile attributes and Environment variables.
ID Record PathID Record PathThe path to the ID field in the record

Supports Expression Language, using FlowFile attributes and Environment variables.
Sparse Vector Indices PathSparse Vector Indices PathIf, Sparse Vectors are to be provided, this RecordPath points to the indices of the sparse data to use.

Supports Expression Language, using FlowFile attributes and Environment variables.
Sparse Vector Values Path *Sparse Vector Values PathIf, Sparse Vectors are to be provided, this RecordPath points to the values of the sparse data to use.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Sparse Vector Indices Path has a value specified
Max Batch Size *Max Batch Size100If the number of Records in a FlowFile is large, creating a single request to Pinecone can consume significant amounts of NiFi heap. In order to avoid this, the Max Batch Size can limit the number of Records to send in a single request. If the number of Records exceeds this value, multiple requests will be sent to Pinecone.
Web Client Service *Web Client ServiceController Service:
WebClientServiceProvider

Implementations:
StandardWebClientServiceProvider
The Web Client Service to use for communicating with Pinecone

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureFlowFiles that cannot be sent to Pinecone, and for which a retry is not expected to be successful, are routed to this relationship
retryFlowFiles that fail to be sent to Pinecone, but for which a retry may help, are routed to this relationship
successFlowFiles that are successfully sent to Pinecone are routed to this relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

This processor does not write attributes.

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

Example Use Cases Involving Other Components

Multiprocessor Use Case 1

Create embeddings for raw text data, or text that exists in a Record field such as JSON, using OpenAI's embeddings model and publish the vectors to Pinecone.

Components Involved

  • CreateOpenAiEmbeddings
    1. Set "OpenAI API Key" to the API key for your OpenAI account.
    2. Set "Model" to the name of the OpenAI model to use for creating embeddings.
    3. Set "Record Writer" to a Record Writer that writes out data in the desired format, typically JSON.
    4. Set "Web Client Service" to a WebClientService that can be used to make requests to the OpenAI API.
    5. If the incoming data is in a structured format such as JSON, set "Record Reader" to a Record Reader that can read the format of the incoming data and set "Text Record Path" to the path to the field that contains the text to be embedded.
    6. Otherwise, if the incoming data is raw text, leave the "Text Record Path" and "Record Reader" properties unset.
    7. Connect the 'success' Relationship to UpsertPinecone.
  • UpsertPinecone
    1. Set "Pinecone API Key" to the API key for your Pinecone account.
    2. Set "Pinecone Index" to the name of the Pinecone index to publish the vectors to.
    3. Set "Pinecone Namespace" to the namespace to use when publishing the vectors.
    4. Set "Record Reader" to a Record Reader that can read the format of the data produced by the CreateOpenAIEmbeddings processor.
    5. Set "Vector Record Path" to /embeddings.
    6. Set "Metadata Record Path" to /metadata.
    7. If the desire is to include the text of the document in Pinecone, set "Text Record Path" to /text and set "Text Field Name" to text.
    8. Otherwise, leave both of these properties unset.
    9. Set "Web Client Service" to the same WebClientService that was used in the CreateOpenAiEmbeddings processor.

Multiprocessor Use Case 2

Add embeddings for a document to a Pinecone index, replacing any embeddings that already exist for the document.

This use case assumes that the incoming content is a set of embeddings; embeddings can be created using the appropriate Processor, such as CreateOpenAiEmbeddings.

Components Involved

  • DeletePinecone
    1. Set the 'Pinecone API Key' property to the API key for the Pinecone service.
    2. Set the 'Pinecone Index' property to the name of the Pinecone index from which to delete vectors.
    3. Set the 'Namespace' property to the name of the namespace that the document lives in, or leave it blank if the namespace is unknown.
    4. Set the 'ID Prefix' property to ${filename:substringBeforeLast('-')}.
    5. Set the 'Web Client Service' property to a WebClientService that can be used to make requests to the OpenAI API.
  • UpsertPinecone
    1. Set "Pinecone API Key" to the API key for your Pinecone account.
    2. Set "Pinecone Index" to the name of the Pinecone index to publish the vectors to.
    3. Set "Pinecone Namespace" to the namespace to use when publishing the vectors.
    4. Set "Record Reader" to a Record Reader that can read the format of the data produced by the CreateOpenAIEmbeddings processor.
    5. Set "Vector Record Path" to /embeddings.
    6. Set "Metadata Record Path" to /metadata.
    7. Leave the "ID Record Path" property unset.
    8. If the desire is to include the text of the document in Pinecone, set "Text Record Path" to /text and set "Text Field Name" to text.
    9. Otherwise, leave both of these properties unset.
    10. Set "Web Client Service" to the same WebClientService that was used in the CreateOpenAiEmbeddings processor.

System Resource Considerations

This component does not specify system resource considerations.

See Also

CreateOpenAiEmbeddings, DeletePinecone