Vector v0.22.0 release notes

The Vector team is pleased to announce version 0.22.0!

Be sure to check out the upgrade guide for breaking changes in this release.

Important: as part of this release, we have promoted the new implementation of disk buffers (buffer.type = "disk_v2") to the default implementation (buffer.type = "disk"). Any existing disk buffers (disk_v1 or disk) will be automatically migrated. We have rigorously tested this migration, but recommend making a back up of the disk buffers (in the configured data_dir, typically in /var/lib/vector) to roll back if necessary. Please see the release highlight for additional updates about this migration.

In addition to the new features, enhancements, and fixes listed below, this release adds:

  • Support for iteration has landed in VRL. Now you can dynamically map unknown key/value pairs in objects and items in arrays. This replaces some common use cases for the lua transform with the much more performant remap transform.
  • New native event codecs for Vector. We are still rolling out the new codec support to all sinks, but this will allow sending events (logs, metrics, and traces) between Vector instances via transports like kafka rather than being limited to the gRPC vector source and sink.
  • A new GCP PubSub (gcp_pubsub) source to consume events from GCP PubSub.
  • A new websocket sink was added to send events to a remote websocket listener.
  • New VRL functions for encrypting and decrypting data.

We also made additional performance improvements this release increasing the average throughput by up to 50% for common topologies (see our soak test framework).

experimentΔ meanΔ mean %confidence
Upgrading Vector
When upgrading, we recommend stepping through minor versions as these can each contain breaking changes while Vector is pre-1.0. These breaking changes are noted in their respective upgrade guides.

Known issues

  • The journald source deadlocks almost immediately (#12966). Fixed in v0.22.1.
  • The kubernetes_logs source does not work with k3s/k3d (#12989). Fixed in v0.22.1.
  • Vector would panic when reloading configuration using the compression or concurrency options due to a deserialization failure (#12919). Fixed in v0.22.1.
  • When using a component that creates a unix socket, vector validate no longer creates the socket (#13018). This causes the default SystemD unit file to fail to start Vector since it runs vector validate before starting Vector. Fixed in v0.22.1.
  • VRL sometimes miscalculates type definitions when conditionals are used causing later usages of values assigned in conditionals to not require type coercion as they should (#12948). Fixed in v0.22.1.
  • Metrics from AWS components were tagged with an endpoint including the full path of the request. For the aws_s3 sink this caused cardinality issues since the AWS S3 key is included in the URL. Fixed in v0.22.3.
  • The gcp_pubsub source would log errors due to attempting to fetch too quickly when it has no acknowledgements to pass along. Fixed in v0.22.3.
  • Vector shuts down when a configured source codec (decoding.codec) receives invalid data. Fixed in v0.23.1.


13 enhancements

  • The journald source now processes data more efficiently by continuing to read new data while waiting for read data to be processed by Vector.
  • Vector’s configuration interpolation of environment variables has been enhanced to both allow setting of default values and returning an error message if an expected environment variable is unset or empty. The syntax matches bash interpolation syntax:

    • ${VARIABLE:-default} evaluates to default if VARIABLE is unset or empty in the environment.
    • ${VARIABLE-default} evaluates to default only if VARIABLE is unset in the environment.
    • ${VARIABLE:?err} exits with an error message containing err if VARIABLE is unset or empty in the environment.
    • ${VARIABLE?err} exits with an error message containing err if VARIABLE is unset in the environment.
    Thanks to @hhromic for contributing this change!
  • The Datadog sinks now retry requests that failed due to an invalid API key. This avoids data loss in the case that an API key is revoked.
  • The kubernetes_logs source now tags emitted internal metrics with pod_namespace.
  • The datadog_metrics sink now supports sending aggregated summary metrics (typically scraped from a Prometheus exporter) to Datadog. Previously these metrics were dropped at the sink.
  • The datadog_metrics sink now supports sending aggregated summary metrics (typically scraped from a Prometheus exporter) to Datadog. Previously these metrics were dropped at the sink.
  • The RPM package now adds the created vector user to the systemd-journal-remote group to be able to consume journald events from a remote system. This matches the Debian package.
  • The kubernetes_logs source now allows configuration of extra_namespace_label_selector which Vector will to use select the pods to capture the logs of, if set, based on labels attached to the pod namespace. This is similar to the extra_label_selector option which applies to pod labels. Thanks to @anapsix for contributing this change!
  • The kubernetes_logs source now reads events in order whenever a pod log file rotates. Previously Vector could start reading the new file before it finished processing the previous one, resulting in the logs being out-of-order. Thanks to @sillent for contributing this change!
  • The parse_json function now takes an optional max_depth parameter to control how far it will recurse when deserializing the event. Once the depth limit is hit, the remainder of the fields is left as raw JSON in the deserialized event.

    For example:

    parse_json!("{"1": {"2": {"3": {"4": {"5": {"6": "finish"}}}}}}", max_depth: 5)


    { "1": { "2": { "3": { "4": { "5": "{"6": "finish"}" } } } } }

    The default remains no max depth limit.

    Thanks to @nabokihms for contributing this change!
  • A new component_received_events_count histogram metric was added to record the sizes of event batches passed around in Vector’s internal topology. Note that this is different than sink-level batching. It is mostly useful for debugging performance issues in Vector due to low internal batching.
  • The http source now allows configuration of the HTTP method to expect requests with via the new method option. Previously it only allowed POST requests. Thanks to @r3b-fish for contributing this change!
  • All components now emit consistent metrics in accordance with Vector’s component specification.

7 new features

  • VRL now includes two new functions for encrypting and decrypting field values: encrypt and decrypt. A random_bytes function was added to make it easy to generate initialization vectors for the encrypt function.

    See the highlight for more details about this new functionality.

  • The socket and syslog sources now allow configuration of the permissions to use when creating a unix socket via socket_file_mode when mode = "unix" is used. Thanks to @Sh4d1 for contributing this change!
  • VRL now allows for a simple form of string templating via {{ some_variable }} syntax. We will be expanding support for templating over time. This does mean that any strings that had {{ }} in them already now need to be escaped. See the upgrade guide for details.
  • Vector has two new codecs that can be used on sources and sinks to encode as Vector’s native representation: native and native_json. This makes it easier to send events between Vector instances on transports like kafka. It also makes it possible to send metrics to Vector from an external process (such as when using the exec source) without needing to use the lua transform to convert logs to metrics. Previously, these generic sources (like exec or http) could only receive logs. See the release highlight for more about this new feature and how to use it.
  • A new gcp_pubsub source was added for consuming events from GCP PubSub.
  • A new websocket sink was added for sending events to a remote websocket listener. Thanks to @zshell31 for contributing this change!
  • A new is_json function was added to VRL. This allows more efficient checking of whether the incoming value is JSON vs. trying to parse it using parse_json and checking if there was an error. Thanks to @nabokihms for contributing this change!

16 bug fixes

  • The splunk_hec source now correctly handles negative acknowledgements from sinks. Previously it would mark the request including the rejected events as delivered. In Splunk’s acknowledgement protocol, this means returning true for the ackID for the request, but now it correctly returns false, indicating the request is not acknowledged.
  • The gcp_stackdriver_metrics sink now requires configuration of labels at the top-level to match the gcp_stackdriver_logs sink. Previously these were nested under .labels.

    See the upgrade guide for more details.

  • The new_relic sink health check now considers any 200-level response a success. It used to require a 200 which did not match what New Relic actually returns: 202.
  • When using Vector’s ability to load configuration from a directory (--config-dir), Vector now ignores subdirectories starting with a ..
  • The aws_s3 sink now only sets the x-amz-tagging header if tags are being applied. Specifying an empty value was incompatible with Ceph.
  • The VRL type definition for . and parse_xml was corrected to be a map of any field/value rather than specifically an empty map. This could cause later false positives with type issues during VRL compilation.
  • VRL now correctly updates the type definition of variables defined in one scope, that are mutated in another.

    For example:

    foo = 1
    { foo = "bar" }

    Would previously fail to compile because VRL thinks foo is an integer when, in fact, it has been reassigned to a string.

  • The internal_metrics source now correctly tags emitted metrics with host and pid when host_key and pid_key are configured, respectively, on the internal_metrics source.
  • The socket source now discards UDP frames greater than the configured max_length (when mode = "udp"). Previously these were truncated rather than discarded, which did not match the behavior when mode = "tcp". All socket source modes are now consistent with dropping messages greater than max_length.
  • The internal_logs source occasionally missed some events generated early in Vector’s start-up, before the component was initialized. This was remedied so that the internal_logs source more reliably captures start-up events.
  • The parse_ruby_hash VRL function can now parse hashes that contain a symbol as the value, such as { "key" => :foo }.
  • The log function in VRL no longer wraps logged string values in quotes. This was causing double quoting for sink encodings like json. Thanks to @nabokihms for contributing this change!
  • The aws_s3 source now handles S3 object keys that contain spaces. Previously Vector would encounter a 404 when querying for objects due to not decoding spaces correctly from the SQS object notification.
  • GCP sinks now correctly handle authentication token refreshing from the metadata service when the health check fails.
  • The http config provider now correctly repolls when an error is encountered. Thanks to @jorgebay for contributing this change!
  • The logfmt sink codec and as the encode_logfmt function now correctly wrap values that contain quotes (") in quotes and escape the inner quote. Thanks to @jalaziz for contributing this change!

What’s next

Removal of legacy buffers
With this release of v0.22.0, we’ve switched the default for disk buffers to the new v2 implementation. This means if you set type = "disk" you will get the new buffer implementation. In a future release, we will remove the legacy disk buffers. To continue using the v1 disk buffers, for now, set type = "disk_v1".

Download Version 0.22.0

Linux (deb)
Linux (rpm)
Windows (MSI)