ParseTableImage
Description
Extracts the text from a Table image and writes it to the FlowFile content in csv format.
Tags
undefined
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Service Location Strategy * | Service Location Strategy | Default |
| Determines how Service Locations are configured within this processor for the Table Structure Recognition Service. |
Custom Table Structure Recognition Service URL * | Custom Table Structure Recognition Service URL | The Custom URL of the Datavolo Table Structure Recognition Service. This property is only considered if:
| ||
OCR Service * | OCR Service | Controller Service: OCRService Implementations: StandardOCRService | An OCR Service for reading files to output text. | |
Communication Timeout * | Communication Timeout | 60 sec | The amount of time to wait for a response from the microservices before timing out. | |
OCR Confidence Threshold * | OCR Confidence Threshold | 10 | The minimum confidence level required for a text block to be included in the output. Text blocks with a confidence level below this value will be excluded. Supports Expression Language, using FlowFile attributes and Environment variables. | |
MIME Type * | MIME Type | ${mime.type} | The MIME Type of the image file. Supports Expression Language, using FlowFile attributes and Environment variables. |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
comms.failure | If the processor is unable to communicate with one of the necessary services, the input FlowFile will be routed to this relationship. |
failure | If a FlowFile cannot be convert into a CSV, the input FlowFile will be routed to this relationship. |
success | When the table text has been successfully extracted, the CSV representation of the text will be routed to this relationship. |
table.not.found | If the processor determines that an input FlowFile does not contain a table, the original FlowFile will be routed to this relationship. |
Reads Attributes
Name | Description |
---|---|
table.text.json | If present, the processor will use this JSON to extract the table text instead of performing OCR. The format expected is an array of JSON objects, each containing a 'text' field as well as an 'x', 'y', 'width', and 'height' field, all of which are floating-point numbers. |
Writes Attributes
Name | Description |
---|---|
filename | The filename of the FlowFile. |
mime.type | The MIME type of the FlowFile. |
table.text.json | If the processor successfully extracts the table text, or if it is determined that the FlowFile does not contain a table, this attribute will be removed. |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
This component requires an incoming relationship.
System Resource Considerations
This component does not specify system resource considerations.