LOG

tokenizer transform

The tokenizer transform accepts log events and allows you to tokenize a field's value by splitting on white space, ignoring special wrapping characters, and zip the tokens into ordered field names.

Configuration

vector.toml
[transforms.my_transform_id]
# REQUIRED - General
type = "tokenizer" # example, must be: "tokenizer"
inputs = ["my-source-id"] # example
field_names = ["timestamp", "level", "message"] # example
# OPTIONAL - General
drop_field = true # default
field = "message" # default
# OPTIONAL - Types
[transforms.my_transform_id.types]
status = "int"

Options

4 items
commonbooloptional

drop_field

If true thefield will be dropped after parsing.

Default: true
View examples
commonstringoptional

field

The log field to tokenize.

Default: "message"
View examples
common[string]required

field_names

The log field names assigned to the resulting tokens, in order.

No default
View examples
commontableoptional

types

Key/Value pairs representing mapped log field types.

commonstringenumrequired

[field-name]

A definition of log field type conversions. They key is the log field name and the value is the type. strptime specifiers are supported for the timestamp type.

No default
Enum, must be one of: "bool" "float" "int" "string" "timestamp"
View examples

Output

Given the following log line:

{
"message": "5.86.210.12 - zieme4647 [19/06/2019:17:20:49 -0400] "GET /embrace/supply-chains/dynamic/vertical" 201 20574"
}

And the following configuration:

[transforms.<transform-id>]
type = "tokenizer"
field = "message"
fields = ["remote_addr", "ident", "user_id", "timestamp", "message", "status", "bytes"]

A log event will be output with the following structure:

{
// ... existing fields
"remote_addr": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": "19/06/2019:17:20:49 -0400",
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": "201",
"bytes": "20574"
}

A few things to note about the output:

  1. The message field was overwritten.
  2. The ident field was dropped since it contained a "-" value.
  3. All values are strings, we have plans to add type coercion.
  4. Special wrapper characters were dropped, such as wrapping [...] and "..." characters.

How It Works

Blank Values

Both " " and "-" are considered blank values and their mapped field will be set to null.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Special Characters

In order to extract raw values and remove wrapping characters, we must treat certain characters as special. These characters will be discarded:

  • "..." - Quotes are used tp wrap phrases. Spaces are preserved, but the wrapping quotes will be discarded.
  • [...] - Brackets are used to wrap phrases. Spaces are preserved, but the wrapping brackets will be discarded.
  • \ - Can be used to escape the above characters, Vector will treat them as literal.