Split Transform
The Vector split
transform
splits a string field on a defined separator.
Configuration
- Common
- Advanced
- vector.toml
- vector.yaml
- vector.json
[transforms.my_transform_id]# Generaltype = "split" # requiredinputs = ["my-source-or-transform-id"] # requireddrop_field = true # optional, defaultfield = "message" # optional, defaultfield_names = ["timestamp", "level", "message", "parent.child"] # requiredseparator = "[whitespace]" # optional, default# Typestypes.status = "int" # exampletypes.duration = "float" # exampletypes.success = "bool" # exampletypes.timestamp_iso8601 = "timestamp|%F" # exampletypes.timestamp_custom = "timestamp|%a %b %e %T %Y" # exampletypes.timestamp_unix = "timestamp|%F %T" # exampletypes.parent.child = "int" # example
- optionalbool
drop_field
If
true
thefield
will be dropped after parsing.- Default:
true
- View examples
- Default:
- optionalstring
field
The field to apply the split on.
- Default:
"message"
- View examples
- Default:
- required[string]
field_names
The field names assigned to the resulting tokens, in order.
- View examples
- optionalstring
separator
The separator to split the field on. If no separator is given, it will split on all whitespace. 'Whitespace' is defined according to the terms of the Unicode Derived Core Property
White_Space
.- Default:
"[whitespace]"
- View examples
- Default:
- optionaltable
types
Key/value pairs representing mapped log field names and types. This is used to coerce log fields from strings into their proper types. The available types are listed in the Types list below.
Timestamp coercions need to be prefaced with
timestamp|
, for example"timestamp|%F"
. Timestamp specifiers can use either of the following:- One of the built-in-formats listed in the Timestamp Formats table below.
- The time format
specifiers
from Rust's
chrono
library.
Types
array
bool
bytes
float
int
map
null
timestamp
(see the table below for formats)
Timestamp Formats
Format Description Example %F %T
YYYY-MM-DD HH:MM:SS
2020-12-01 02:37:54
%v %T
DD-Mmm-YYYY HH:MM:SS
01-Dec-2020 02:37:54
%FT%T
ISO 8601[RFC 3339](https://tools.ietf.org/html/rfc3339) format without time zone 2020-12-01T02:37:54
%a, %d %b %Y %T
| RFC 822/2822 without time zone |Tue, 01 Dec 2020 02:37:54
%a %d %b %T %Y
|date
command output without time zone |Tue 01 Dec 02:37:54 2020
%a %b %e %T %Y
| ctime format |Tue Dec 1 02:37:54 2020
%s
| UNIX timestamp |1606790274
%FT%TZ
| ISO 8601/RFC 3339 UTC |2020-12-01T09:37:54Z
%+
| ISO 8601/RFC 3339 UTC with time zone |2020-12-01T02:37:54-07:00
%a %d %b %T %Z %Y
|date
command output with time zone |Tue 01 Dec 02:37:54 PST 2020
%a %d %b %T %z %Y
|date
command output with numeric time zone |Tue 01 Dec 02:37:54 -0700 2020
%a %d %b %T %#z %Y
|date
command output with numeric time zone (minutes can be missing or present) |Tue 01 Dec 02:37:54 -07 2020
Note: the examples in this table are for 54 seconds after 2:37 am on December 1st, 2020 in Pacific Standard Time.
Telemetry
This component provides the following metrics that can be retrieved through
the internal_metrics
source. See the
metrics section in the
monitoring page for more info.
- counter
processing_errors_total
The total number of processing errors encountered by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.error_type
- The type of the errorinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_events_total
The total number of events processed by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.file
- The file that produced the errorinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_bytes_total
The total number of bytes processed by the component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
Examples
Given the following Vector event:
{"log": {"message": "5.86.210.12,zieme4647,19/06/2019:17:20:49 -0400,GET /embrace/supply-chains/dynamic/vertical,201,20574"}}
And the following configuration:
[transforms.split]type = "split"field = "message"separator = ","field_names = ["remote_addr", "user_id", "timestamp", "message", "status", "bytes"]types.status = "int"types.bytes = "int"
The following Vector log event will be output:
{"remote_addr": "5.86.210.12","user_id": "zieme4647","timestamp": "19/06/2019:17:20:49 -0400","message": "GET /embrace/supply-chains/dynamic/vertical","status": 201,"bytes": 20574}