LOG

regex_parser transform

The regex_parser transform accepts log events and allows you to parse a log field's value with a Regular Expression.

Configuration

vector.toml
[transforms.my_transform_id]
# REQUIRED - General
type = "regex_parser" # example, must be: "regex_parser"
inputs = ["my-source-id"] # example
regex = "^(?P<timestamp>[\\w\\-:\\+]+) (?P<level>\\w+) (?P<message>.*)$" # example
# OPTIONAL - General
drop_field = true # default
field = "message" # default
# OPTIONAL - Types
[transforms.my_transform_id.types]
status = "int"

Options

4 items
commonbooloptional

drop_field

If the specifiedfield should be dropped (removed) after parsing.

Default: true
View examples
commonstringoptional

field

The log field to parse. See Failed Parsing for more info.

Default: "message"
View examples
commonstringrequired

regex

The Regular Expression to apply. Do not include the leading or trailing /. See Failed Parsing and Regex Debugger for more info.

No default
View examples
commontableoptional

types

Key/Value pairs representing mapped log field types. See Regex Syntax for more info.

commonstringenumrequired

[field-name]

A definition of log field type conversions. They key is the log field name and the value is the type. strptime specifiers are supported for the timestamp type.

No default
Enum, must be one of: "bool" "float" "int" "string" "timestamp"
View examples

Output

Given the following log line:

{
"message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
}

And the following configuration:

[transforms.<transform-id>]
type = "regex_parser"
field = "message"
regex = '^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'
[transforms.<transform-id>.types]
bytes_int = "int"
timestamp = "timestamp|%d/%m/%Y:%H:%M:%S %z"
status = "int"
bytes_out = "int"

A log event will be output with the following structure:

{
// ... existing fields
"bytes_in": 5667,
"host": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": <19/06/2019:17:20:49 -0400>,
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": 201,
"bytes": 20574
}

Things to note about the output:

  1. The message field was overwritten.
  2. The bytes_in, timestamp, status, and bytes_out fields were coerced.

How It Works

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Failed Parsing

If thefield value fails to parse against the providedregex then an error will be logged and the event will be kept or discarded depending on the drop_failed value.

A failure includes any event that does not successfully parse against the providedregex. This includes bad values as well as events missing the specifiedfield.

Performance

The regex_parser source has been involved in the following performance tests:

Learn more in the Performance sections.

Regex Debugger

To test the validity of theregex option, we recommend the Rust Regex Tester. Note, you must use named captures in your regex to map the results to fields.

Regex Syntax

Vector follows the documented Rust Regex syntax since Vector is written in Rust. This syntax follows a Perl-style regular expression syntax, but lacks a few features like look around and backreferences.

Named Captures

You can name Regex captures with the <name> syntax. For example:

^(?P<timestamp>\w*) (?P<level>\w*) (?P<message>.*)$

Will capture timestamp, level, and message. All values are extracted as string values and must be coerced with thetypes table.

More info can be found in the Regex grouping and flags documentation.

Flags

Regex flags can be toggled with the (?flags) syntax. The available flags are:

FlagDescriuption
icase-insensitive: letters match both upper and lower case
mmulti-line mode: ^ and $ match begin/end of line
sallow . to match \n
Uswap the meaning of x* and x*?
uUnicode support (enabled by default)
xignore whitespace and allow line comments (starting with #)

For example, to enable the case-insensitive flag you can write:

(?i)Hello world

More info can be found in the Regex grouping and flags documentation.