Skip to main content

CreateOpenAiEmbeddings

Description

Uses OpenAI to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile.

Tags

chatbot, embeddings, gen ai, generative ai, llm, nlp, openai, text

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
OpenAI API Key *OpenAI API KeyThe API Key for authenticating to OpenAI
Embeddings Model *Embeddings Modeltext-embedding-3-smallThe model to use for embeddings
OpenAI OrganizationOpenAI OrganizationThe organization to use for OpenAI
DimensionsDimensionsThe number of dimensions to request the resulting output embeddings have. This is only supported in text-embedding-3 and later models.
UserUserAn identifier for the remote user on whose behalf the request is being made; OpenAI uses this to detect and prevent abuse.

Supports Expression Language, using FlowFile attributes and Environment variables.
Record ReaderRecord ReaderController Service:
RecordReaderFactory

Implementations:
AvroReader
CEFReader
CSVReader
ExcelReader
GrokReader
JsonPathReader
JsonTreeReader
ReaderLookup
ScriptedReader
Syslog5424Reader
SyslogReader
WindowsEventLogReader
XMLReader
YamlTreeReader
The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset.
Record Writer *Record WriterController Service:
RecordSetWriterFactory

Implementations:
AvroRecordSetWriter
CSVRecordSetWriter
FreeFormTextRecordSetWriter
JsonRecordSetWriter
RecordSetWriterLookup
ScriptedRecordSetWriter
XMLRecordSetWriter
The Record Writer to use for writing the output
Text Record Path *Text Record Path/textThe path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Record Reader has a value specified
Embeddings Record Path *Embeddings Record Path/embeddingsThe path to the field in the record where the embeddings are to be written.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Record Reader has a value specified
Max Batch Size *Max Batch Size100The maximum number of records to include in each batch sent to OpenAI
Web Client Service *Web Client ServiceController Service:
WebClientServiceProvider

Implementations:
StandardWebClientServiceProvider
The Web Client Service to use for communicating with OpenAI

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureThe original FlowFile will be routed to this relationship if the embeddings could not be created
successThe embeddings will be routed to this relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
mime.typeThe MIME type of the output data, based on the chosen Record Writer
record.countThe number of records written to the output

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

Input requirements are not specified for this component.

Example Use Cases

Use Case 1

Create embeddings for text using OpenAI's Embeddings

Configuration

Set "OpenAI API Key" to the API key for your OpenAI account.
Set "Model" to the name of the OpenAI model to use for creating embeddings.
Set "Record Writer" to a Record Writer that writes out data in the desired format, typically JSON.
Set "Web Client Service" to a WebClientService that can be used to make requests to the OpenAI API.

If the incoming data is in a structured format such as JSON, set "Record Reader" to a Record Reader that can read the format of the incoming data and set "Text Record Path" to the path to the field that contains the text to be embedded.
Otherwise, if the incoming data is raw text, leave the "Text Record Path" and "Record Reader" properties unset.

System Resource Considerations

This component does not specify system resource considerations.

See Also

CreateAzureOpenAiEmbeddings, PromptOpenAI