Tokenizer Transform

The Vector tokenizer transform parses logs

Configuration

[transforms.my_transform_id]
# General
type = "tokenizer" # required
inputs = ["my-source-or-transform-id"] # required
drop_field = true # optional, default
field = "message" # optional, default
field_names = ["timestamp", "level", "message", "parent.child"] # required
# Types
types.status = "int" # example
types.duration = "float" # example
types.success = "bool" # example
types.timestamp_iso8601 = "timestamp|%F" # example
types.timestamp_custom = "timestamp|%a %b %e %T %Y" # example
types.parent.child = "int" # example
  • commonoptionalbool

    drop_field

    If true the field will be dropped after parsing.

    • Default: true
    • View examples
  • commonoptionalstring

    field

    The log field to tokenize.

    • Default: "message"
    • View examples
  • commonrequired[string]

    field_names

    The log field names assigned to the resulting tokens, in order.

    • View examples
  • commonoptionaltable

    types

    Key/value pairs representing mapped log field names and types. This is used to coerce log fields into their proper types.

Output

Telemetry

This component provides the following metrics that can be retrieved through the internal_metrics source. See the metrics section in the monitoring page for more info.

  • counter

    processing_errors_total

    The total number of processing errors encountered by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • error_type - The type of the error

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    processed_events_total

    The total number of events processed by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • file - The file that produced the error

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    processed_bytes_total

    The total number of bytes processed by the component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

Examples

Given the following Vector event:

{
"log": {
"message": "5.86.210.12 - zieme4647 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
}
}

And the following configuration:

[transforms.tokenizer]
type = "tokenizer"
field = "message"
field_names = ["remote_addr", "ident", "user_id", "timestamp", "message", "status", "bytes"]
types.timestamp = "timestamp"
types.status = "int"
types.bytes = "int"

The following Vector log event will be output:

{
"remote_addr": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": "19/06/2019:17:20:49 -0400",
"message": "GET /embrace/supply-chains/dynamic/vertical",
"status": 201,
"bytes": 20574
}

How It Works

Blank Values

Both " " and "-" are considered blank values and their mapped fields will be set to null.

Special Characters

In order to extract raw values and remove wrapping characters, we must treat certain characters as special. These characters will be discarded:

  • "..." - Quotes are used tp wrap phrases. Spaces are preserved, but the wrapping quotes will be discarded.
  • [...] - Brackets are used to wrap phrases. Spaces are preserved, but the wrapping brackets will be discarded.
  • \ - Can be used to escape the above characters, Vector will treat them as literal.