0.23 Upgrade Guide

An upgrade guide that addresses breaking changes in 0.23.0

Vector’s 0.23.0 release includes breaking changes:

  1. The .deb package no longer enables and starts the Vector systemd service
  2. VRL type definition updates
  3. “remove_empty” option dropped from VRL’s parse_grok and parse_groks
  4. VRL conditions are now checked for mutations at compile time
  5. syslog source and VRL’s parse_syslog structured data fields made consistent
  6. VRL VM beta runtime removed
  7. gcp_pubsub sink requires setting encoding option
  8. humio_metrics sink no longer has encoding option
  9. New framing and encoding options for sinks
  10. Support for older OSes dropped
  11. kubernetes_logs source now requires rights to list and watch nodes
  12. datadog_agent source metrics now contain a namespace parsed from the event name

and deprecations:

  1. Shorthand values for encoding options deprecated
  2. Sink encoding value ndjson is now json encoding + newline_delimited framing

We cover them below to help you upgrade quickly:

Upgrade guide

Breaking changes

The .deb package no longer enables and starts the Vector systemd service

The official .deb package no longer automatically enables and starts the Vector systemd service. This is in line with how the RPM package behaves.

To enable and start the service (after configuring it to your requirements), you can use systemctl enable --now:

systemctl enable --now vector

To just start the service without enabling it to run at system startup,

systemctl start vector

VRL type definition updates

There were many situations where VRL didn’t calculate the correct type definition. These are now fixed. In some cases this can cause compilation errors when upgrading if the code relied on the previous (incorrect) behavior.

This affects the following:

  • the “merge” operator (| or |=) on objects that share keys with different types
  • if statements
  • nullability checking for most expressions (usually related to if statements)
  • expressions that contain the abort expression
  • the del function
  • closure arguments

The best way to fix these issues is to let the compiler guide you through the problems, it will usually provide suggestions on how to fix the issue. Please give us feedback if you think any error diagnostics could be improved, we are continually trying to improve them.

The most common error you will probably see is the fallibility of a function changed because the type of one of the parameters changed.

For example, if you are trying to “split” a string, but the input could now be null, the error would look like this

error[E110]: invalid argument type
  ┌─ :1:7
  │
1 │ split(msg, " ")
  │       ^^^
  │       │
  │       this expression resolves to one of string or null
  │       but the parameter "value" expects the exact type string
  │
  = try: ensuring an appropriate type at runtime
  =
  =     msg = string!(msg)
  =     split(msg, " ")
  =
  = try: coercing to an appropriate type and specifying a default value as a fallback in case coercion fails
  =
  =     msg = to_string(msg) ?? "default"
  =     split(msg, " ")
  =
  = see documentation about error handling at https://errors.vrl.dev/#handling
  = learn more about error code 110 at https://errors.vrl.dev/110
  = see language documentation at https://vrl.dev
  = try your code in the VRL REPL, learn more at https://vrl.dev/examples

As suggested, you have a few options to solve errors like this.

  1. Abort if the arguments aren’t the right type by appending the function name with !, such as to_string!(msg)
  2. Force the type to be a string, using the string function. This function will error at runtime if the value isn’t the expected type. You can call it as string! to abort if it’s not the right type.
  3. Provide a default value if the function fails using the “error coalescing” operator (??), such as to_string(msg) ?? "default"
  4. Handle the error manually by capturing both the return value and possible error, such as result, err = to_string(msg)

“remove_empty” option dropped from VRL’s parse_grok and parse_groks

The “remove_empty” argument has been dropped from both the parse_grok and the parse_groks functions. Previously, these functions would return empty strings for non-matching pattern names, but now they are not returned. To preserve the old behavior, you can do something like the following to merge in empty strings for each unmatched group:

parsed = parse_grok!(.message, "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}")
expected = { "timestamp": "", "level": "", "message": ""}
parsed = merge(expected, parsed)

VRL conditions are now checked for mutations at compile time

VRL conditions, for example those used in the filter transform, are not supposed to mutate the event. Previously the mutations would be silently ignored after a condition ran. Now the compiler has support for read-only values, and will give a compile-time error if you try to mutate the event in a condition.

Example filter transform config

[transforms.filter]
type = "filter"
inputs = [ "input" ]
condition.type = "vrl"
condition.source = """
.foo = "bar"
true
"""

New error

error[E315]: mutation of read-only value
  ┌─ :1:1
  │
1 │ .foo = "bar"
  │ ^^^^^^ mutation of read-only value
  │
  = see language documentation at https://vrl.dev

syslog source and VRL’s parse_syslog structured data fields made consistent

Previously, the parse_syslog VRL function and the syslog source handled parsing the structured data section of syslog messages differently:

  • The syslog source inserted a field with the name of the structured data element, with the fields as keys in that map. It would create further nested maps if the structured data key names had .s in them.
  • The parse_syslog function would instead prefix the structured data keys with the name of the structured data element they appeared in, but would insert this as a flat key/value structure rather than nesting (so that referencing keys would require quoting to escape the .s).

With this release the behavior of both is now to parse the structured data section as a flat map of string key / string value, and insert it into the target under a field with the name of the structured data element.

That is:

<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message

Now returns (for both the syslog source and the parse_syslog function):

{
  "appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
  "exampleSDID@32473": {
    "foo.baz": "bar"
  },
  "facility": "kern",
  "hostname": "Gregorys-MacBook-Pro.local",
  "message": "test message",
  "procid": 79978,
  "severity": "alert",
  "timestamp": "2022-04-25T23:21:45.715740Z",
  "version": 1
}

Where previously VRL’s parse_syslog function returned:

{
  "appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
  "exampleSDID@32473.foo.baz": "bar",
  "facility": "kern",
  "hostname": "Gregorys-MacBook-Pro.local",
  "message": "test message",
  "procid": 79978,
  "severity": "alert",
  "timestamp": "2022-04-25T23:21:45.715740Z",
  "version": 1
}

And the syslog source returned:

{
  "appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
  "exampleSDID@32473": {
    "foo": {
      "baz": "bar"
    }
  },
  "facility": "kern",
  "hostname": "Gregorys-MacBook-Pro.local",
  "message": "test message",
  "procid": 79978,
  "severity": "alert",
  "timestamp": "2022-04-25T23:21:45.715740Z",
  "version": 1
}

The previous parse_syslog behavior can be acheived by running the result through the flatten function like:

flatten(parse_syslog!(s'<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message'))

VRL VM beta runtime removed

The experimental VM runtime for VRL-based components has been removed. The stable AST runtime remains in place, and is now nearly identical in performance to the VM runtime. If you have runtime = "vm" configured in your config, you need to remove it to avoid Vector from erroring on startup.

gcp_pubsub sink requires setting encoding option

The gcp_pubsub sink now supports a variety of codecs. To encode your logs as JSON before publishing them to Cloud Pub/Sub, add the following encoding option

encoding.codec = "json"

to the config of your gcp_pubsub sink.

humio_metrics sink no longer has encoding option

The humio_metrics sink configuration no longer expects an encoding option. If you previously used the encoding option

encoding.codec = "json"

you need to remove the line from your humio_metrics config. Metrics are now always sent to Humio using the JSON format.

New framing and encoding options for sinks

We streamlined the encoding configuration for our sinks, enabling all applicable sinks to select from a variety of codecs: json, text, raw, logfmt, avro, native or native_json, e.g. by setting

encoding.codec = "json"

in your sink configuration.

Additionally, some sinks now support configuring how encoded events should be separated within a stream or batch: bytes, character_delimited, length_delimited or newline_delimited, e.g. by setting

framing.method = "newline_delimited"

in your sink configuration.

The following sinks support setting an encoding codec: aws_cloudwatch_logs, aws_kinesis_firehose, aws_kinesis_streams, aws_s3, aws_sqs, azure_blob, console, file, gcp_cloud_storage, gcp_pubsub, http, humio_logs, kafka, loki, nats, papertrail, pulsar, redis, socket, splunk_hec_logs and websocket.

Additionally, the following sinks support setting a framing method: aws_s3, azure_blob, console, file, gcp_cloud_storage, http and socket.

Support for older OSes dropped

Due to changes to the tool we use for cross-compiling Vector, support for operating systems with old versions of libc and libstdc++ were dropped for the x86-uknown_linux-gnu target. Vector now requires that the host system has libc >= 2.18 and libstdc++ >= 3.4.21 with support for ABI version 1.3.8.

Known OSes that this affects:

  • Amazon Linux 1
  • Ubuntu 14.04
  • CentOS 7

We will be looking at options to re-add support for these OSes in the future.

kubernetes_logs source now requires rights to list and watch nodes

Logs from Kubernetes pods are now annotated with a node’s labels on which a pod is running.

  1. For official helm-chart users, upgrade the chart to the version >= 0.11.0 before upgrading the vector version in your cluster.
  2. For custom vector installations, modify the cluster role assigned to the vector service account to include nodes. The result should look like the following snippet:
# Permissions to use Kubernetes API.
# Requires that RBAC authorization is enabled.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vector
rules:
  - apiGroups:
      - ""
    resources:
      - namespaces
      - nodes
      - pods
    verbs:
      - list
      - watch

datadog_agent source metrics now contain a namespace parsed from the event name

Incoming events from the datadog_agent source contain a name that is . (period) delimited where the first element is the namespace eg “system.fs.inodes.total”. Before this release, the metric event outputted by the datadog_agent source contained an empty namespace, and the name contained the full unparsed name from the Datadog Agent. In this release, the namespace is parsed out of the name. Taking the prior example, the event name becomes “fs.inodes.todal” and the namespace is “system”.

This was introduced in order to better handle metrics sent from the datadog_agent source to the datadog_metrics sink, where previously they would be lacking a namespace and so would have one added by the sink if default_namespace was set.

The result is that configurations with VRL expressions that expect the namespace to be in the name will need to be adapted to either remove the namespace from the name, or join the namespace and the name, for example:

full_metric_name = """
join([.namespace, .name], ".")
"""

Deprecations

Shorthand values for encoding options deprecated

We are deprecating setting encoding options by a shorthand string. E.g. when your sink encoding used

encoding = "json"

it should now be replaced by explicitly setting the codec

encoding.codec = "json"

Sink encoding value ndjson is now json encoding + newline_delimited framing

The ndjson encoding value will be phased out since the newline_delimited behavior may be either set by default or can be set explicitly via a dedicated framing option.

This affects all sink configurations that previously used

encoding.codec = "ndjson"

The aws_s3, gcp_cloud_storage and azure_blob sinks should be configured to use a combination of json encoding and newline_delimited framing instead

framing.method = "newline_delimited"
encoding.codec = "json"

For all other sinks, simply set the codec to json to maintain the current behavior

encoding.codec = "json"