Skip to main content

SplitAvro

Description

Splits a binary encoded Avro datafile into smaller files based on the configured Output Size. The Output Strategy determines if the smaller files will be Avro datafiles, or bare Avro records with metadata in the FlowFile attributes. The output will always be binary encoded.

Tags

avro, split

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Split Strategy *Split StrategyRecord
  • Record
The strategy for splitting the incoming datafile. The Record strategy will read the incoming datafile by de-serializing each record.
Output Size *Output Size1The number of Avro records to include per split file. In cases where the incoming file has less records than the Output Size, or when the total number of records does not divide evenly by the Output Size, it is possible to get a split file with less records.
Output Strategy *Output StrategyDatafile
  • Datafile
  • Bare Record
Determines the format of the output. Either Avro Datafile, or bare record. Bare record output is only intended for use with systems that already require it, and shouldn't be needed for normal use.
Transfer Metadata *Transfer Metadatatrue
  • true
  • false
Whether or not to transfer metadata from the parent datafile to the children. If the Output Strategy is Bare Record, then the metadata will be stored as FlowFile attributes, otherwise it will be in the Datafile header.

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureIf a FlowFile fails processing for any reason (for example, the FlowFile is not valid Avro), it will be routed to this relationship
originalThe original FlowFile that was split. If the FlowFile fails processing, nothing will be sent to this relationship
splitAll new files split from the original FlowFile will be routed to this relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
fragment.countThe number of split FlowFiles generated from the parent FlowFile
fragment.identifierAll split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.indexA one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
segment.original.filenameThe filename of the parent FlowFile

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

ScopeDescription
MEMORYAn instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.

See Also