Skip to main content

GrokReader

Description

Provides a mechanism for reading unstructured text data, such as log files, and structuring the data so that it can be processed. The service is configured using Grok patterns. The service reads from a stream of data and splits each message that it finds into a separate Record, each containing the fields that are configured. If a line in the input does not match the expected message pattern, the line of text is either considered to be part of the previous message or is skipped, depending on the configuration, with the exception of stack traces. A stack trace that is found at the end of a log message is considered to be part of the previous message but is added to the 'stackTrace' field of the Record. If a record has no stack trace, it will have a NULL value for the stackTrace field (assuming that the schema does in fact include a stackTrace field of type String). Assuming that the schema includes a '_raw' field of type String, the raw message will be included in the Record.

Tags

grok, logfiles, logs, logstash, parse, pattern, reader, record, regex, text, unstructured

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Schema Access Strategy *schema-access-strategyUse String Fields From Grok Expression
  • Use String Fields From Grok Expression
  • Use 'Schema Name' Property
  • Use 'Schema Text' Property
  • Schema Reference Reader
Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Registryschema-registryController Service:
SchemaRegistry

Implementations:
AmazonGlueSchemaRegistry
ApicurioSchemaRegistry
AvroSchemaRegistry
ConfluentSchemaRegistry
Specifies the Controller Service to use for the Schema Registry

This property is only considered if:
  • the property Schema Access Strategy has a value of schema-reference-reader or schema-name
Schema Nameschema-name${schema.name}Specifies the name of the schema to lookup in the Schema Registry property

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Schema Access Strategy has a value of schema-name
Schema Versionschema-versionSpecifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Schema Access Strategy has a value of schema-name
Schema Branchschema-branchSpecifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Schema Access Strategy has a value of schema-name
Schema Textschema-text${avro.schema}The text of an Avro-formatted Schema

Supports Expression Language, using FlowFile attributes and Environment variables.

This property is only considered if:
  • the property Schema Access Strategy has a value of schema-text-property
Schema Reference Reader *schema-reference-readerController Service:
SchemaReferenceReader

Implementations:
ConfluentEncodedSchemaReferenceReader
Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier

This property is only considered if:
  • the property Schema Access Strategy has a value of schema-reference-reader
Grok PatternsGrok Pattern FileGrok Patterns to use for parsing logs. If not specified, a built-in default Pattern file will be used. If specified, all patterns specified will override the default patterns. See the Controller Service's Additional Details for a list of pre-defined patterns.

Supports Expression Language, using Environment variables.
Grok Expressions *Grok ExpressionSpecifies the format of a log line in Grok format. This allows the Record Reader to understand how to parse each log line. The property supports one or more Grok expressions. The Reader attempts to parse input lines according to the configured order of the expressions.If a line in the log file does not match any expressions, the line will be assumed to belong to the previous log message.If other Grok patterns are referenced by this expression, they need to be supplied in the Grok Pattern File property.
No Match Behavior *no-match-behaviorAppend to Previous Message
  • Append to Previous Message
  • Skip Line
  • Raw Line
If a line of text is encountered and it does not match the given Grok Expression, and it is not part of a stack trace, this property specifies how the text should be processed.

State Management

This component does not store state.

Restricted

Required PermissionExplanation
reference remote resourcesPatterns and Expressions can reference resources over HTTP

System Resource Considerations

This component does not specify system resource considerations.

See Also