PutVectaraDocument
Description
Generate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format.
Tags
ai, llm, rag, vectara
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Vectara Client * | Vectara Client | Controller Service: VectaraClientService Implementations: StandardVectaraClientService | Vectara Client Service. | |
Corpus ID * | Corpus ID | Identifier of the Vectara corpus | ||
Index Input Format * | Index Input Format | JSON Lines |
| Input format for indexing service. JSON Object: Load FlowFile content directly as JSON payload. JSON Lines: Create a new section for each line of JSON. JSON Array: Load FlowFile content as a JSON array and create a new section for each element in the JSON array. |
Document ID * | Document ID | A unique identifier for the document constructed either from the source path of the document or a hash of the document's content. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Title | Document Title | Document Title Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Source URL | Document Source URL | Source URL for document Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Author | Document Author | Author of the document Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Description | Document Description | Description of the document Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Date | Document Date | Date of document creation Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Creation Time | Document Creation Time | Timestamp in epoch seconds when the document was created Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Document Attributes | Document Attributes | A comma delimited list of NiFi attributes fields, which if present will be included in the document metadata. This property is only considered if:
| ||
Section Metadata JSON Path * | Section Metadata JSON Path | $.metadata | A JSON Path expression to a metadata JSON Object. The JSON Object needs to contain the list of metadata fields. These fields will be included in Section metadata. This property is only considered if:
| |
Section Text JSON Path * | Section Text JSON Path | $.text | A JSON Path expression to the text field. This property is only considered if:
| |
Section ID Attribute | Section ID Attribute | The field for setting section id, which is populated if present in the metadata path. This property is only considered if:
| ||
Section Title Attribute | Section Title Attribute | The field for setting the section title, which is populated if present in the metadata path. This property is only considered if:
| ||
Section Filter Attributes | Section Filter Attributes | A comma delimited list of metadata fields, which if present in the metadata path will be included as a section metadata filter. This property is only considered if:
| ||
Section Metadata Attributes | Section Metadata Attributes | A comma delimited list of metadata fields, which if present in the metadata path will be included will be included in the section metadata. This property is only considered if:
| ||
Section Custom Dimensions | Section Custom Dimensions | A comma delimited list of metadata fields, which if present in the metadata path will be included as a section's custom dimension. The values for custom dimensions must be valid numbers. This property is only considered if:
|
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
failure | Vectara failure relationship |
original | Original relationship |
success | Vectara success relationship |
Reads Attributes
This processor does not read attributes.
Writes Attributes
This processor does not write attributes.
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
Input requirements are not specified for this component.
Example Use Cases Involving Other Components
Multiprocessor Use Case 1
Publish a PDF file to a Vectara corpus.
Components Involved
- FetchS3Object
- Set "Bucket" to the desired S3 Bucket name.
- Set "Object Key" to the S3 path of the PDF file
- Set "Region" to the AWS S3 Bucket region
- Set "AWS Credentials Provider Service" to a configured AWS Credentials provider.
- ParseDocument
- Set "Input Format" to PDF
- ChunkDocument
- Leave the properties as their default values
- PutVectaraDocument
- Set "Vectara Client" to a StandardVectaraClientService that can be used to make requests to Vectara's IndexService API.
- Set "Corpus ID" to the desired corpus to load data to.
- Set "Index Input Format" to JSON Lines.
- Set "Document ID" to ${filename}.
- Set "Document Source URL" to s3://{s3.bucket}/{filename}
- Set "Document Creation Time" to ${s3.lastModified:divide(1000)}
- Set "Document Attributes" to s3.bucket, s3.lastModified, s3.length, mime.type
System Resource Considerations
This component does not specify system resource considerations.