Skip to main content

ParseTableImage

Description

Extracts the text from a Table image and writes it to the FlowFile content in csv format.

Tags

undefined

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Service Location Strategy *Service Location StrategyDefault
  • Default
  • Custom
Determines how Service Locations are configured within this processor for the Table Structure Recognition Service.
Custom Table Structure Recognition Service URL *Custom Table Structure Recognition Service URLThe Custom URL of the Datavolo Table Structure Recognition Service.

This property is only considered if:
  • the property Service Location Strategy has a value of Custom
OCR Service *OCR ServiceController Service:
OCRService

Implementations:
StandardOCRService
An OCR Service for reading files to output text.
Communication Timeout *Communication Timeout60 secThe amount of time to wait for a response from the microservices before timing out.
OCR Confidence Threshold *OCR Confidence Threshold10The minimum confidence level required for a text block to be included in the output. Text blocks with a confidence level below this value will be excluded.

Supports Expression Language, using FlowFile attributes and Environment variables.
MIME Type *MIME Type${mime.type}The MIME Type of the image file.

Supports Expression Language, using FlowFile attributes and Environment variables.

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
comms.failureIf the processor is unable to communicate with one of the necessary services, the input FlowFile will be routed to this relationship.
failureIf a FlowFile cannot be convert into a CSV, the input FlowFile will be routed to this relationship.
successWhen the table text has been successfully extracted, the CSV representation of the text will be routed to this relationship.
table.not.foundIf the processor determines that an input FlowFile does not contain a table, the original FlowFile will be routed to this relationship.

Reads Attributes

NameDescription
table.text.jsonIf present, the processor will use this JSON to extract the table text instead of performing OCR. The format expected is an array of JSON objects, each containing a 'text' field as well as an 'x', 'y', 'width', and 'height' field, all of which are floating-point numbers.

Writes Attributes

NameDescription
filenameThe filename of the FlowFile.
mime.typeThe MIME type of the FlowFile.
table.text.jsonIf the processor successfully extracts the table text, or if it is determined that the FlowFile does not contain a table, this attribute will be removed.

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

This component does not specify system resource considerations.

See Also

ParsePdfDocument