0.23 Upgrade Guide
An upgrade guide that addresses breaking changes in 0.23.0
Vector’s 0.23.0 release includes breaking changes:
- The
.debpackage no longer enables and starts the Vector systemd service - VRL type definition updates
- “remove_empty” option dropped from VRL’s
parse_grokandparse_groks - VRL conditions are now checked for mutations at compile time
syslogsource and VRL’sparse_syslogstructured data fields made consistent- VRL VM beta runtime removed
gcp_pubsubsink requires settingencodingoptionhumio_metricssink no longer hasencodingoption- New
framingandencodingoptions for sinks - Support for older OSes dropped
kubernetes_logssource now requires rights to list and watch nodesdatadog_agentsource metrics now contain a namespace parsed from the event name
and deprecations:
- Shorthand values for
encodingoptions deprecated - Sink encoding value
ndjsonis nowjsonencoding +newline_delimitedframing
We cover them below to help you upgrade quickly:
Upgrade guide
Breaking changes
The .deb package no longer enables and starts the Vector systemd service
The official .deb package
no longer automatically enables and starts the Vector systemd service.
This is in line with how the RPM package behaves.
To enable and start the service (after configuring it to your requirements),
you can use systemctl enable --now:
systemctl enable --now vector
To just start the service without enabling it to run at system startup,
systemctl start vector
VRL type definition updates
There were many situations where VRL didn’t calculate the correct type definition. These are now fixed. In some cases this can cause compilation errors when upgrading if the code relied on the previous (incorrect) behavior.
This affects the following:
- the “merge” operator (
|or|=) on objects that share keys with different types - if statements
- nullability checking for most expressions (usually related to if statements)
- expressions that contain the
abortexpression - the
delfunction - closure arguments
The best way to fix these issues is to let the compiler guide you through the problems, it will usually provide suggestions on how to fix the issue. Please give us feedback if you think any error diagnostics could be improved, we are continually trying to improve them.
The most common error you will probably see is the fallibility of a function changed because the type of one of the parameters changed.
For example, if you are trying to “split” a string, but the input could now be null, the error would look like this
error[E110]: invalid argument type
┌─ :1:7
│
1 │ split(msg, " ")
│ ^^^
│ │
│ this expression resolves to one of string or null
│ but the parameter "value" expects the exact type string
│
= try: ensuring an appropriate type at runtime
=
= msg = string!(msg)
= split(msg, " ")
=
= try: coercing to an appropriate type and specifying a default value as a fallback in case coercion fails
=
= msg = to_string(msg) ?? "default"
= split(msg, " ")
=
= see documentation about error handling at https://errors.vrl.dev/#handling
= learn more about error code 110 at https://errors.vrl.dev/110
= see language documentation at https://vrl.dev
= try your code in the VRL REPL, learn more at https://vrl.dev/examples
As suggested, you have a few options to solve errors like this.
- Abort if the arguments aren’t the right type by appending the function name with
!, such asto_string!(msg) - Force the type to be a string, using the
stringfunction. This function will error at runtime if the value isn’t the expected type. You can call it asstring!to abort if it’s not the right type. - Provide a default value if the function fails using the “error coalescing” operator (
??), such asto_string(msg) ?? "default" - Handle the error manually by capturing both the return value and possible error, such as
result, err = to_string(msg)
“remove_empty” option dropped from VRL’s parse_grok and parse_groks
The “remove_empty” argument has been dropped from both the parse_grok and the
parse_groks functions. Previously, these functions would return empty strings
for non-matching pattern names, but now they are not returned. To preserve the
old behavior, you can do something like the following to merge in empty strings
for each unmatched group:
parsed = parse_grok!(.message, "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}")
expected = { "timestamp": "", "level": "", "message": ""}
parsed = merge(expected, parsed)
VRL conditions are now checked for mutations at compile time
VRL conditions, for example those used in the filter transform, are not supposed to mutate the event. Previously
the mutations would be silently ignored after a condition ran. Now the compiler has support for read-only values, and
will give a compile-time error if you try to mutate the event in a condition.
Example filter transform config
[transforms.filter]
type = "filter"
inputs = [ "input" ]
condition.type = "vrl"
condition.source = """
.foo = "bar"
true
"""
New error
error[E315]: mutation of read-only value
┌─ :1:1
│
1 │ .foo = "bar"
│ ^^^^^^ mutation of read-only value
│
= see language documentation at https://vrl.dev
syslog source and VRL’s parse_syslog structured data fields made consistent
Previously, the parse_syslog VRL function and the syslog source handled parsing the structured
data section of syslog messages differently:
- The
syslogsource inserted a field with the name of the structured data element, with the fields as keys in that map. It would create further nested maps if the structured data key names had.s in them. - The
parse_syslogfunction would instead prefix the structured data keys with the name of the structured data element they appeared in, but would insert this as a flat key/value structure rather than nesting (so that referencing keys would require quoting to escape the.s).
With this release the behavior of both is now to parse the structured data section as a flat map of string key / string value, and insert it into the target under a field with the name of the structured data element.
That is:
<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message
Now returns (for both the syslog source and the parse_syslog function):
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473": {
"foo.baz": "bar"
},
"facility": "kern",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
Where previously VRL’s parse_syslog function returned:
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473.foo.baz": "bar",
"facility": "kern",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
And the syslog source returned:
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473": {
"foo": {
"baz": "bar"
}
},
"facility": "kern",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
The previous parse_syslog behavior can be achieved by running the result through the flatten
function like:
flatten(parse_syslog!(s'<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message'))
VRL VM beta runtime removed
The experimental VM runtime for VRL-based components has been removed. The
stable AST runtime remains in place, and is now nearly identical in performance
to the VM runtime. If you have runtime = "vm" configured in your config, you
need to remove it to avoid Vector from erroring on startup.
gcp_pubsub sink requires setting encoding option
The gcp_pubsub sink now supports a variety of codecs. To encode your logs as JSON before
publishing them to Cloud Pub/Sub, add the following encoding option
encoding.codec = "json"
to the config of your gcp_pubsub sink.
humio_metrics sink no longer has encoding option
The humio_metrics sink configuration no longer expects an encoding option.
If you previously used the encoding option
encoding.codec = "json"
you need to remove the line from your humio_metrics config. Metrics are now
always sent to Humio using the JSON format.
New framing and encoding options for sinks
We streamlined the encoding configuration for our sinks, enabling all applicable sinks to select
from a variety of codecs: json, text, raw, logfmt, avro, native or native_json, e.g.
by setting
encoding.codec = "json"
in your sink configuration.
Additionally, some sinks now support configuring how encoded events should be separated within a
stream or batch: bytes, character_delimited, length_delimited or newline_delimited, e.g. by
setting
framing.method = "newline_delimited"
in your sink configuration.
The following sinks support setting an encoding codec: aws_cloudwatch_logs,
aws_kinesis_firehose, aws_kinesis_streams, aws_s3, aws_sqs, azure_blob, console, file,
gcp_cloud_storage, gcp_pubsub, http, humio_logs, kafka, loki, nats, papertrail,
pulsar, redis, socket, splunk_hec_logs and websocket.
Additionally, the following sinks support setting a framing method: aws_s3, azure_blob,
console, file, gcp_cloud_storage, http and socket.
Support for older OSes dropped
Due to changes to the tool we use for cross-compiling Vector,
support for operating systems with old versions of libc and libstdc++ were dropped for the
x86-unknown_linux-gnu target. Vector now requires that the host system has libc >= 2.18 and
libstdc++ >= 3.4.21 with support for ABI version 1.3.8.
Known OSes that this affects:
- Amazon Linux 1
- Ubuntu 14.04
- CentOS 7
We will be looking at options to re-add support for these OSes in the future.
kubernetes_logs source now requires rights to list and watch nodes
Logs from Kubernetes pods are now annotated with a node’s labels on which a pod is running.
- For official helm-chart users, upgrade the chart to the version >= 0.11.0 before upgrading the vector version in your cluster.
- For custom vector installations, modify the cluster role assigned to the vector service account to include nodes. The result should look like the following snippet:
# Permissions to use Kubernetes API.
# Requires that RBAC authorization is enabled.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vector
rules:
- apiGroups:
- ""
resources:
- namespaces
- nodes
- pods
verbs:
- list
- watch
datadog_agent source metrics now contain a namespace parsed from the event name
Incoming events from the datadog_agent source contain a name that is . (period) delimited where the first element is the namespace eg “system.fs.inodes.total”. Before this release, the metric event outputted by the datadog_agent source contained an empty namespace, and the name contained the full unparsed name from the Datadog Agent. In this release, the namespace is parsed out of the name. Taking the prior example, the event name becomes “fs.inodes.total” and the namespace is “system”.
This was introduced in order to better handle metrics sent from the datadog_agent source to the datadog_metrics sink, where previously they would be lacking a namespace and so would have one added by the sink if default_namespace was set.
The result is that configurations with VRL expressions that expect the namespace to be in the name will need to be adapted to either remove the namespace from the name, or join the namespace and the name, for example:
full_metric_name = """
join([.namespace, .name], ".")
"""
Deprecations
Shorthand values for encoding options deprecated
We are deprecating setting encoding options by a shorthand string. E.g. when your sink encoding used
encoding = "json"
it should now be replaced by explicitly setting the codec
encoding.codec = "json"
Sink encoding value ndjson is now json encoding + newline_delimited framing
The ndjson encoding value will be phased out since the newline_delimited behavior may be either set by default or
can be set explicitly via a dedicated framing option.
This affects all sink configurations that previously used
encoding.codec = "ndjson"
The http, aws_s3, gcp_cloud_storage and azure_blob sinks should be configured to use a combination of json
encoding and newline_delimited framing instead
framing.method = "newline_delimited"
encoding.codec = "json"
For all other sinks, simply set the codec to json to maintain the current behavior
encoding.codec = "json"