Tokenizer Transform

The Vector tokenizer transform accepts log events and allows you to tokenize a field's value by splitting on white space, ignoring special wrapping characters, and zip the tokens into ordered field names.


# REQUIRED - General
type = "tokenizer" # must be: "tokenizer"
inputs = ["my-source-id"] # example
field_names = ["timestamp", "level", "message"] # example
# OPTIONAL - General
drop_field = true # default
field = "message" # default
# OPTIONAL - Types


4 items


If true the field will be dropped after parsing.

Default: true
View examples


The log field to tokenize.

Default: "message"
View examples


The log field names assigned to the resulting tokens, in order.

No default
View examples


Key/Value pairs representing mapped log field types.



A definition of log field type conversions. They key is the log field name and the value is the type. strptime specifiers are supported for the timestamp type.

No default
Enum, must be one of: "bool" "float" "int" "string" "timestamp"
View examples


Given the following log line:

"message": " - zieme4647 [19/06/2019:17:20:49 -0400] "GET /embrace/supply-chains/dynamic/vertical" 201 20574"

And the following configuration:

type = "tokenizer"
field = "message"
fields = ["remote_addr", "ident", "user_id", "timestamp", "message", "status", "bytes"]

A log event will be output with the following structure:

// ... existing fields
"remote_addr": "",
"user_id": "zieme4647",
"timestamp": "19/06/2019:17:20:49 -0400",
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": "201",
"bytes": "20574"

A few things to note about the output:

  1. The message field was overwritten.
  2. The ident field was dropped since it contained a "-" value.
  3. All values are strings, we have plans to add type coercion.
  4. Special wrapper characters were dropped, such as wrapping [...] and "..." characters.

How It Works

Blank Values

Both " " and "-" are considered blank values and their mapped field will be set to null.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Special Characters

In order to extract raw values and remove wrapping characters, we must treat certain characters as special. These characters will be discarded:

  • "..." - Quotes are used tp wrap phrases. Spaces are preserved, but the wrapping quotes will be discarded.
  • [...] - Brackets are used to wrap phrases. Spaces are preserved, but the wrapping brackets will be discarded.
  • \ - Can be used to escape the above characters, Vector will treat them as literal.