Skip to main content

PutVectaraDocument

Description

Generate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format.

Tags

ai, llm, rag, vectara

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Vectara Client *Vectara ClientController Service:
VectaraClientService

Implementations:
StandardVectaraClientService
Vectara Client Service.
Corpus ID *Corpus IDIdentifier of the Vectara corpus
Index Input Format *Index Input FormatJSON Lines
  • JSON Object
  • JSON Lines
  • JSON Array
Input format for indexing service. JSON Object: Load FlowFile content directly as JSON payload. JSON Lines: Create a new section for each line of JSON. JSON Array: Load FlowFile content as a JSON array and create a new section for each element in the JSON array.
Document ID *Document IDA unique identifier for the document constructed either from the source path of the document or a hash of the document's content.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document TitleDocument TitleDocument Title

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document Source URLDocument Source URLSource URL for document

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document AuthorDocument AuthorAuthor of the document

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document DescriptionDocument DescriptionDescription of the document

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document DateDocument DateDate of document creation

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document Creation TimeDocument Creation TimeTimestamp in epoch seconds when the document was created

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Document AttributesDocument AttributesA comma delimited list of NiFi attributes fields, which if present will be included in the document metadata.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section Metadata JSON Path *Section Metadata JSON Path$.metadataA JSON Path expression to a metadata JSON Object. The JSON Object needs to contain the list of metadata fields. These fields will be included in Section metadata.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section Text JSON Path *Section Text JSON Path$.textA JSON Path expression to the text field.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section ID AttributeSection ID AttributeThe field for setting section id, which is populated if present in the metadata path.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section Title AttributeSection Title AttributeThe field for setting the section title, which is populated if present in the metadata path.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section Filter AttributesSection Filter AttributesA comma delimited list of metadata fields, which if present in the metadata path will be included as a section metadata filter.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section Metadata AttributesSection Metadata AttributesA comma delimited list of metadata fields, which if present in the metadata path will be included will be included in the section metadata.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY
Section Custom DimensionsSection Custom DimensionsA comma delimited list of metadata fields, which if present in the metadata path will be included as a section's custom dimension. The values for custom dimensions must be valid numbers.

This property is only considered if:
  • the property Index Input Format has a value of JSON_LINES or JSON_ARRAY

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureVectara failure relationship
originalOriginal relationship
successVectara success relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

This processor does not write attributes.

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

Input requirements are not specified for this component.

Example Use Cases Involving Other Components

Multiprocessor Use Case 1

Publish a PDF file to a Vectara corpus.

Components Involved

  • FetchS3Object
    1. Set "Bucket" to the desired S3 Bucket name.
    2. Set "Object Key" to the S3 path of the PDF file
    3. Set "Region" to the AWS S3 Bucket region
    4. Set "AWS Credentials Provider Service" to a configured AWS Credentials provider.
  • ParseDocument
    1. Set "Input Format" to PDF
  • ChunkDocument
    1. Leave the properties as their default values
  • PutVectaraDocument
    1. Set "Vectara Client" to a StandardVectaraClientService that can be used to make requests to Vectara's IndexService API.
    2. Set "Corpus ID" to the desired corpus to load data to.
    3. Set "Index Input Format" to JSON Lines.
    4. Set "Document ID" to ${filename}.
    5. Set "Document Source URL" to s3://{s3.bucket}/{filename}
    6. Set "Document Creation Time" to ${s3.lastModified:divide(1000)}
    7. Set "Document Attributes" to s3.bucket, s3.lastModified, s3.length, mime.type

System Resource Considerations

This component does not specify system resource considerations.

See Also

PutVectaraFile