GrokReader
Description
Provides a mechanism for reading unstructured text data, such as log files, and structuring the data so that it can be processed. The service is configured using Grok patterns. The service reads from a stream of data and splits each message that it finds into a separate Record, each containing the fields that are configured. If a line in the input does not match the expected message pattern, the line of text is either considered to be part of the previous message or is skipped, depending on the configuration, with the exception of stack traces. A stack trace that is found at the end of a log message is considered to be part of the previous message but is added to the 'stackTrace' field of the Record. If a record has no stack trace, it will have a NULL value for the stackTrace field (assuming that the schema does in fact include a stackTrace field of type String). Assuming that the schema includes a '_raw' field of type String, the raw message will be included in the Record.
Tags
grok, logfiles, logs, logstash, parse, pattern, reader, record, regex, text, unstructured
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Schema Access Strategy * | schema-access-strategy | Use String Fields From Grok Expression |
| Specifies how to obtain the schema that is to be used for interpreting the data. |
Schema Registry | schema-registry | Controller Service: SchemaRegistry Implementations: AmazonGlueSchemaRegistry ApicurioSchemaRegistry AvroSchemaRegistry ConfluentSchemaRegistry | Specifies the Controller Service to use for the Schema Registry This property is only considered if:
| |
Schema Name | schema-name | ${schema.name} | Specifies the name of the schema to lookup in the Schema Registry property Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Schema Version | schema-version | Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Schema Branch | schema-branch | Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| ||
Schema Text | schema-text | ${avro.schema} | The text of an Avro-formatted Schema Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Schema Reference Reader * | schema-reference-reader | Controller Service: SchemaReferenceReader Implementations: ConfluentEncodedSchemaReferenceReader | Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier This property is only considered if:
| |
Grok Patterns | Grok Pattern File | Grok Patterns to use for parsing logs. If not specified, a built-in default Pattern file will be used. If specified, all patterns specified will override the default patterns. See the Controller Service's Additional Details for a list of pre-defined patterns. Supports Expression Language, using Environment variables. | ||
Grok Expressions * | Grok Expression | Specifies the format of a log line in Grok format. This allows the Record Reader to understand how to parse each log line. The property supports one or more Grok expressions. The Reader attempts to parse input lines according to the configured order of the expressions.If a line in the log file does not match any expressions, the line will be assumed to belong to the previous log message.If other Grok patterns are referenced by this expression, they need to be supplied in the Grok Pattern File property. | ||
No Match Behavior * | no-match-behavior | Append to Previous Message |
| If a line of text is encountered and it does not match the given Grok Expression, and it is not part of a stack trace, this property specifies how the text should be processed. |
State Management
This component does not store state.
Restricted
Required Permission | Explanation |
---|---|
reference remote resources | Patterns and Expressions can reference resources over HTTP |
System Resource Considerations
This component does not specify system resource considerations.