Skip to main content

ExtractAvroMetadata

Description

Extracts metadata from the header of an Avro datafile.

Tags

avro, metadata, schema

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Fingerprint Algorithm *Fingerprint AlgorithmCRC-64-AVRO
  • CRC-64-AVRO
  • MD5
  • SHA-256
The algorithm used to generate the schema fingerprint. Available choices are based on the Avro recommended practices for fingerprint generation.
Metadata KeysMetadata KeysA comma-separated list of keys indicating key/value pairs to extract from the Avro file header. The key 'avro.schema' can be used to extract the full schema in JSON format, and 'avro.codec' can be used to extract the codec name if one exists.
Count Items *Count Itemsfalse
  • true
  • false
If true the number of items in the datafile will be counted and stored in a FlowFile attribute 'item.count'. The counting is done by reading blocks and getting the number of items for each block, thus avoiding de-serializing. The items being counted will be the top-level items in the datafile. For example, with a schema of type record the items will be the records, and for a schema of type Array the items will be the arrays (not the number of entries in each array).

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureA FlowFile is routed to this relationship if it cannot be parsed as Avro or metadata cannot be extracted for any reason
successA FlowFile is routed to this relationship after metadata has been extracted.

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
item.countThe total number of items in the datafile, only written if Count Items is set to true.
schema.fingerprintThe result of the Fingerprint Algorithm as a Hex string.
schema.nameContains the name when the type is a record, enum or fixed, otherwise contains the name of the primitive type.
schema.typeThe type of the schema (i.e. record, enum, etc.).

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

This component does not specify system resource considerations.

See Also