CreateOpenAiEmbeddings
Description
Uses OpenAI to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile.
Tags
chatbot, embeddings, gen ai, generative ai, llm, nlp, openai, text
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
OpenAI API Key * | OpenAI API Key | The API Key for authenticating to OpenAI | ||
Embeddings Model * | Embeddings Model | text-embedding-3-small | The model to use for embeddings | |
OpenAI Organization | OpenAI Organization | The organization to use for OpenAI | ||
Dimensions | Dimensions | The number of dimensions to request the resulting output embeddings have. This is only supported in text-embedding-3 and later models. | ||
User | User | An identifier for the remote user on whose behalf the request is being made; OpenAI uses this to detect and prevent abuse. Supports Expression Language, using FlowFile attributes and Environment variables. | ||
Record Reader | Record Reader | Controller Service: RecordReaderFactory Implementations: AvroReader CEFReader CSVReader ExcelReader GrokReader JsonPathReader JsonTreeReader ReaderLookup ScriptedReader Syslog5424Reader SyslogReader WindowsEventLogReader XMLReader YamlTreeReader | The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset. | |
Record Writer * | Record Writer | Controller Service: RecordSetWriterFactory Implementations: AvroRecordSetWriter CSVRecordSetWriter FreeFormTextRecordSetWriter JsonRecordSetWriter RecordSetWriterLookup ScriptedRecordSetWriter XMLRecordSetWriter | The Record Writer to use for writing the output | |
Text Record Path * | Text Record Path | /text | The path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Embeddings Record Path * | Embeddings Record Path | /embeddings | The path to the field in the record where the embeddings are to be written. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Max Batch Size * | Max Batch Size | 100 | The maximum number of records to include in each batch sent to OpenAI | |
Web Client Service * | Web Client Service | Controller Service: WebClientServiceProvider Implementations: StandardWebClientServiceProvider | The Web Client Service to use for communicating with OpenAI |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
failure | The original FlowFile will be routed to this relationship if the embeddings could not be created |
success | The embeddings will be routed to this relationship |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
mime.type | The MIME type of the output data, based on the chosen Record Writer |
record.count | The number of records written to the output |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
Input requirements are not specified for this component.
Example Use Cases
Use Case 1
Create embeddings for text using OpenAI's Embeddings
Configuration
Set "OpenAI API Key" to the API key for your OpenAI account.
Set "Model" to the name of the OpenAI model to use for creating embeddings.
Set "Record Writer" to a Record Writer that writes out data in the desired format, typically JSON.
Set "Web Client Service" to a WebClientService that can be used to make requests to the OpenAI API.
If the incoming data is in a structured format such as JSON, set "Record Reader" to a Record Reader that can read the format of the incoming data and set "Text Record Path" to the path to the field that contains the text to be embedded.
Otherwise, if the incoming data is raw text, leave the "Text Record Path" and "Record Reader" properties unset.
System Resource Considerations
This component does not specify system resource considerations.