Regex Parser Transform

The Vector regex_parser transform accepts and outputs log events, allowing you to parse a log field's value with a Regular Expression.

Configuration

vector.toml
[transforms.my_transform_id]
# General
type = "regex_parser" # required
inputs = ["my-source-or-transform-id"] # required
drop_field = true # optional, default
field = "message" # optional, default
patterns = "['^(?P<timestamp>[\\w\\-:\\+]+) (?P<level>\\w+) (?P<message>.*)$']" # required
# Types
types.status = "int" # example
types.duration = "float" # example
types.success = "bool" # example
types.timestamp = "timestamp|%F" # example
types.timestamp = "timestamp|%a %b %e %T %Y" # example
types.parent.child = "int" # example
  • boolcommonoptional

    drop_field

    If the specified field should be dropped (removed) after parsing.

    • Default: true
    • View examples
  • stringcommonoptional

    field

    The log field to parse. See Failed Parsing and Field Notation Syntax for more info.

    • Default: "message"
    • View examples
  • booloptional

    overwrite_target

    If target_field is set and the log contains a field of the same name as the target, it will only be overwritten if this is set to true.

    • Default: true
    • View examples
  • stringcommonrequired

    patterns

    The Regular Expressions to apply. Do not include the leading or trailing / in any of the expressions.

    • No default
    • View examples
  • stringoptional

    target_field

    If this setting is present, the parsed fields will be inserted into the log as a sub-object with this name. If a field with the same name already exists, the parser will fail and produce an error. See Field Notation Syntax for more info.

    • No default
    • View examples
  • tablecommonoptional

    types

    Key/value pairs representing mapped log field names and types. This is used to coerce log fields into their proper types. See Regex Syntax for more info.

    • stringenumcommonoptional

      [field-name]

      A definition of log field type conversions. They key is the log field name and the value is the type. strptime specifiers are supported for the timestamp type.

      • No default
      • Enum, must be one of: "bool" "float" "int" "string" "timestamp"
      • View examples

Examples

Given the following log line:

log event
{
"message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
}

And the following configuration:

vector.toml
[transforms.<transform-id>]
type = "regex_parser"
field = "message"
patterns = ['^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$']
[transforms.<transform-id>.types]
bytes_in = "int"
timestamp = "timestamp|%d/%m/%Y:%H:%M:%S %z"
status = "int"
bytes_out = "int"

A log event will be output with the following structure:

log event
{
// ... existing fields
"bytes_in": 5667,
"host": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": <19/06/2019:17:20:49 -0400>,
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": 201,
"bytes_out": 20574
}

Things to note about the output:

  1. The message field was overwritten.
  2. The bytes_in, timestamp, status, and bytes_out fields were coerced.

How It Works

Complex Processing

If you encounter limitations with the regex_parser transform then we recommend using a runtime transform. These transforms are designed for complex processing and give you the power of full programming runtime.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Failed Parsing

If the field value fails to parse against the provided regex then an error will be logged and the event will be kept or discarded depending on the drop_failed value.

A failure includes any event that does not successfully parse against the provided regex. This includes bad values as well as events missing the specified field.

Field Notation Syntax

The field and target_field options support Vector's field notation syntax, enabling access to root-level, nested, and array field values. For example:

vector.toml
[transforms.my_regex_parser_transform_id]
# ...
field = "message"
field = "parent.child"
# ...

You can learn more about Vector's field notation in the field notation reference.

Performance

The regex_parser source has been involved in the following performance tests:

Learn more in the Performance sections.

Regex Debugger

To test the validity of theregex option, we recommend the Rust Regex Tester. Note, you must use named captures in your regex to map the results to fields.

Regex Syntax

Vector follows the documented Rust Regex syntax since Vector is written in Rust. This syntax follows a Perl-style regular expression syntax, but lacks a few features like look around and backreferences.

Named Captures

You can name Regex captures with the <name> syntax. For example:

^(?P<timestamp>\w*) (?P<level>\w*) (?P<message>.*)$

Will capture timestamp, level, and message. All values are extracted as string values and must be coerced with the types table.

More info can be found in the Regex grouping and flags documentation.

Flags

Regex flags can be toggled with the (?flags) syntax. The available flags are:

FlagDescriuption
icase-insensitive: letters match both upper and lower case
mmulti-line mode: ^ and $ match begin/end of line
sallow . to match \n
Uswap the meaning of x* and x*?
uUnicode support (enabled by default)
xignore whitespace and allow line comments (starting with #)

For example, to enable the case-insensitive flag you can write:

(?i)Hello world

More info can be found in the Regex grouping and flags documentation.

Value Coercion

Values can be coerced upon extraction via the types.* options. This functions exactly like the coercer transform except that its coupled within this transform for convenience.

Timestamps

You can coerce values into timestamps via the timestamp type:

vector.toml
# ...
types.first_timestamp = "timestamp" # best effort parsing
types.second_timestamp = "timestamp|%Y-%m-%dT%H:%M:%S%z" # ISO8601
# ...

As noted above, if you do not specify a specific strftime format, Vector will make a best effort attempt to parse the timestamp against the following common formats:

FormatDescription
Without Timezone
%F %TYYYY-MM-DD HH:MM:SS
%v %TDD-Mmm-YYYY HH:MM:SS
FT%TISO 8601 / RFC 3339 without TZ
m/%d/%Y:%TUS common date format
a, %d %b %Y %TRFC 822/2822 without TZ
a %d %b %T %Ydate command output without TZ
A %d %B %T %Ydate command output without TZ, long names
a %b %e %T %Yctime format
With Timezone
%+ISO 8601 / RFC 3339
%a %d %b %T %Z %Ydate command output
%a %d %b %T %z %Ydate command output, numeric TZ
%a %d %b %T %#z %Ydate command output, numeric TZ
UTC Formats
%sUNIX timestamp
%FT%TZISO 8601 / RFC 3339 UTC