0.21 Upgrade Guide

An upgrade guide that addresses breaking changes in 0.21.0

Vector’s 0.21.0 release includes breaking changes:

  1. Syntax changes for non-VRL paths
  2. GraphQL API outputEventsByComponentIdPatterns subscription argument patterns changed to outputsPatterns
  3. GraphQL API EventNotification type changes
  4. Deprecated GraphQL API subscriptions have been removed
  5. The vector vrl timezone flag -tz is now -z
  6. The vector top human_metrics flag -h is now -H
  7. Remainder operator (%) in VRL is fallible
  8. AWS SDK Migration
  9. Route transform metric event_discarded_total removed
  10. buffer_discarded_events_total now includes received events
  11. kubernetes_logs source rewritten to use kube-rs
  12. Published docker images no longer create implicit volumes
  13. VRL now includes lexical scoping for blocks
  14. CLI options: delimiters, wildcards, and boolean options

And deprecations:

  1. The GraphQL API subscriptions: receivedEventsTotal, sentEventsTotal, sentEventsThroughput, receivedEventsThroughput have been deprecated
  2. End-to-end acknowledgement configuration moved to sinks

We cover them below to help you upgrade quickly:

Upgrade guide

Breaking changes

Syntax changes for non-VRL paths

Previously, there were two different ways to describe paths. VRL uses a newer syntax, while everything else in Vector still used an older syntax. This was a constant pain point for users, and we have taken some steps to migrate towards the VRL syntax. This is a breaking change that may require migration.

The old syntax was very lenient in the characters that were allowed in a field name. It also supported single character escapes. The new syntax only allows A-Z a-z 0-9 _ @. Any other character will require the field name to be quoted. Quotes around a field name replace single character escaping. This brings the old syntax in line with the newer (VRL) syntax. Note that VRL makes a distinction between a field starting with a “.” (event query) and without (variable query). Outside a VRL context, the “.” is optional and ignored.

Migration will be required for any paths used outside a VRL context. That is any transform (except remap and conditions), templating, and any source or sink referencing field names. There are no changes to the VRL syntax.

Here are some examples that require migrating

old syntaxnew syntaxcomment
foo\.bar.baz“foo.bar”.bazThe . field separator needs to be escaped if used as part of a field name. The old syntax allowed individual character escaping. The new syntax requires quotes around the field name.
headers.User-Agentheaders.“User-Agent”- requires quotes with the new syntax
foo with spaces“foo with spaces”Spaces also need to be quoted
foo\“bar“foo\“bar”Double quotes and backlashes must be escaped inside quotes

TOML transform example

Old syntax

[transforms.dedupe]
type = "dedupe"
inputs = ["input"]
fields.match = ["message.user-identifier"]

New syntax (the dash requires the field name to be quoted)

[transforms.dedupe]
type = "dedupe"
inputs = ["input"]
fields.match = ["message.\"user-identifier\""]

For more information on the new syntax, you can review the documentation here

GraphQL API outputEventsByComponentIdPatterns subscription argument patterns changed to outputsPatterns

To avoid confusion and align with the new inputsPatterns argument, we’ve renamed the original patterns argument to outputsPatterns. outputsPatterns allows you to specify patterns that will match against components (sources, transforms) and display their outflowing events. inputsPatterns allows you to specify patterns that match against components (transforms, sinks) and display their incoming events.

Note that using an input pattern to match a component is effectively a shorthand. It’s the same as using one or more output patterns to match against all the outputs flowing into a component.

Updating your subscriptions is as simple as renaming patterns to outputsPatterns.

- subscription {
-  outputEventsByComponentIdPatterns(patterns: [...])
+ subscription {
+  outputEventsByComponentIdPatterns(outputsPatterns: [...])

GraphQL API EventNotification type changes

As part of adding a new notification (InvalidMatch) to warn users against attempting invalid matches, we’ve reshaped the EventNotification type for easier querying and future extensibility.

Previously, the EventNotification type consisted simply of a pattern and plain enum describing the notification.

type EventNotification {
  pattern: String!
  notification: EventNotificationType!
}

While this worked well for simple notifications like Matched and NotMatched, it was awkward to extend to new notifications, like InvalidMatch, which may want to include more information than pattern. Thus, the new EventNotification type takes the following form:

type EventNotification {
  notification: Notification!
  message: String!
}

where Notification is a union of specific kinds of notifications:

union Notification = Matched | NotMatched | InvalidMatch

message is a new human-readable description of the notification while notification contains additional details specific to the kind of notification. All the same information is still available, and the following example shows how you might convert an existing query to the new schema.

 subscription {
-   outputEventsByComponentIdPatterns(patterns: [...]) {
+   outputEventsByComponentIdPatterns(outputsPatterns: [...]) {
     __typename
     ... on EventNotification {
-       pattern
-       notification
+       message
+       notification {
+         __typename
+         ... on Matched {
+           pattern
+         }
+         ... on NotMatched {
+           pattern
+         }
+         ... on InvalidMatch {
+           pattern
+           invalidMatches
+         }
+       }
      }
     }
   }
 }

Deprecated GraphQL API subscriptions have been removed

The following deprecated subscriptions have been removed in this release. Please use the listed alternatives.

  • eventsInTotal: use componentReceivedEventsTotals
  • eventsOutTotal: use componentSentEventsTotals
  • componentEventsInThroughputs: use componentReceivedEventsThroughputs
  • componentEventsInTotals: use componentReceivedEventsTotals
  • componentEventsOutThroughputs: use componentSentEventsThroughputs
  • componentEventsOutTotals: use componentSentEventsTotals
  • eventsInThroughput: use componentReceivedEventsThroughputs
  • eventsOutThroughput: use componentSentEventsThroughputs

The vector vrl timezone flag -tz is now -z

We upgraded the Vector CLI to use Clap v3, a popular Rust crate.

A breaking change in Clap v3 is that shortened CLI flags now use the char type, meaning they are restricted to single characters.

As a consequence, the shortened form of our vector vrl --timezone flag (previously --tz) has been updated to the more succinct -z.

The vector top human_metrics short flag -h is now -H

To avoid clashing and issues with our upgrade to Clap v3, the short -h from --help and -h from --human_metrics in the vector top command have been disambiguated. The shortened form for --human_metrics is now -H and -h is reserved for --help.

Remainder operator (%) in VRL is fallible

The remainder operator in VRL has become a fallible operation. This is because finding the remainder with a divisor of zero can raise an error that needs to be handled.

Before this VRL would compile:

.remainder = 50 % .value

If .value was zero, Vector would panic. This can be fixed by handling the error:

.remainder = 50 % .value ?? 0

AWS SDK Migration

We have migrated sources and sinks that utilize AWS to the official AWS SDK (from Rusoto). This comes with some benefits such as support for IMDSv2. This requires us to deprecate some config options.

The new AWS SDK lacks support for certain authentication configuration:

  • Vector now only supports IMDSv2 for authentication. If you were previously using IMDSv1, you will need to configure the host to allow IMDSv2. For EKS, this likely means increasing the metadata token response hop limit to 2 (see Amazon EKS now supports EC2 Instance Metadata Service v2 . We are discussing the possibility of re-adding support for IMDSv1 in #12376.
  • Vector AWS components’ auth.credential_file option was removed as the new SDK does not yet support it it. It is still possible to use a credentials file, but it should be placed in the default location (~/.aws/credentials on Linux, OS X, and Unix; %userprofile%\.aws\credentials on Microsoft Windows), or the location should be set with an environment variable (AWS_CONFIG_FILE or AWS_SHARED_CREDENTIALS_FILE).
  • Support for credential_process in an AWS profile was dropped as it is not yet supported by the new SDK.

Specifying a region is now also required. Make sure a region is specified in either the AWS config file, or the Vector config.

The assume_role config option was deprecated and moved to auth.assume_role previously. This deprecated option has now been removed.

This affects the following components:

  • AWS Cloudwatch Metrics Sink
  • AWS Cloudwatch Logs Sink
  • AWS SQS Source (this was migrated in a previous version)
  • AWS SQS Sink
  • AWS Kinesis Streams Sink
  • AWS Kinesis Firehose Sink
  • AWS S3 Sink
  • AWS S3 Source
  • Datadog Archives Sink (s3 config only)
  • Elasticsearch Sink

For more details on configuring auth, you can visit these links:

Route transform metric event_discarded_total removed

Until now, when using the route transform, if an event didn’t match any configured route, this event would be discarded and lost for the following transforms and sinks.

A new _unmatched route has now been introduced and the events are no longer discarded, making the event_discarded_total metric irrelevant so it has been dropped.

You can still get the total number of events that match no routes via component_events_sent_total with a tag of output=_unmatched.

buffer_discarded_events_total now includes received events

The buffer_discarded_events_total now includes all events flowing into a buffer, even those discarded when the buffer is full and drop_newest is configured as the when_full behavior.

This brings the metric inline with the component-level received and discarded metrics where events are counted as received before being discarded as well as leaving the door open for additional discard strategies like drop_oldest where events would live in the buffer before possibly being discarded.

kubernetes_logs source rewritten to use kube-rs

The kubernetes_logs source has had two breaking changes:

  • It now requires the list verb for Vector’s ClusterRole resource. If you are using the Helm chart, version 0.7.0 includes this change. Otherwise, make sure to add it to your manifest.
  • The proxy configuration was dropped. Instead, configure any needed proxy configuration in your kubeconfig.

See the highlight for more information.

Published docker images no longer create implicit volumes

Previously /var/lib/vector was defined as a volume in the Dockerfiles for the published Vector images. This led to the creation of the volume each time you ran a Vector container from these images whether you wanted it or not.

Instead, if you need a volume for the data directory, you should provide one when launching the container.

When migrating from an earlier version to 0.21.0 or later using Docker compose and implicit volumes, you need to use docker inspect to find out which volumes your container is mapped to so that you can map them to the upgraded container as well.

See vectordotdev/vector#11982 for additional rationale.

VRL now includes lexical scoping for blocks

In preparation for VRL iteration support landing in the next release, this release of Vector includes a breaking change to the way variable scoping works.

Specifically, variables defined in nested blocks cannot be accessed by parent blocks.

This is best explained with an example:

# top-level scope
count1 = 1

# nested block
{
  count2 = 1
  count1 = count1 + 1

  # nested block
  {
    count2 = count2 + 1
    count1 = count1 + 1
  }
}

count1 # returns ”3”
count2 # returns a compile-time error, because ”count2” was defined in a nested block

CLI options: delimiters, wildcards, and boolean options

When using CLI options that can take multiple values, the provided values must be comma separated. For example:

vector --config foo.toml,bar.toml

Additionally when passing values that contain wildcards (*), these values must be quoted. For example:

vector --config "*.toml"

The --watch-config option previously required a boolean value, which is no longer needed. For example, in earlier releases:

vector --watch-config=true

This should become:

vector --watch-config

Iteration Sneak Preview

The introduction of lexical scoping is important for when iteration support lands, which (as a sneak preview) allows you to do the following:

data = { foo: 1, bar: 2 }

map(data) -> |key, value| {
  new_key = upcase(key)

  [new_key, value + 1]
}

data # returns { ”FOO”: 2, ”BAR”: 3 }

new_key # returns a compile-time error, because ”new_key” is a variable scoped
        # to the enumeration closure block

Without lexical scoping, it would be ambiguous what new_key should return in the last example, but now it’s clear that the variable remains undefined outside of the closure block.

Deprecations

The GraphQL API subscriptions: receivedEventsTotal, sentEventsTotal, sentEventsThroughput, receivedEventsThroughput have been deprecated

While these subscriptions were intended to display aggregate metrics across all components, they currently only show a per-component metric and are made redundant by more informative subscriptions that include specific component information. To avoid misuse and confusion, we are deprecating them in favor of the following alternatives.

  • receivedEventsTotal: use componentReceivedEventsTotals
  • sentEventsTotal: use componentSentEventsTotals
  • sentEventsThroughput: use componentSentEventsThroughputs
  • receivedEventsThroughput: use componentReceivedEventsThroughputs

End-to-end acknowledgement configuration moved to sinks

Currently, end-to-end acknowledgements are opt-in at the source-level via the acknowledgements.enabled setting. This made sense initially since sources are the ones that are acknowledging back to clients, but makes it difficult to achieve durability. Durability, which is the primary goal of acknowledgements, is sink-dependent instead of source-dependent. That is, it is important to assert that all data going to a system of record is fully acknowledged, for example, for all the sources that it came from, this guaranteeing delivery to that destination.

To achieve this, an acknowledgements option has been added to sinks. When the configuration is loaded, all sources that are connected to a sink that has this option enabled will automatically be configured to wait for sinks to acknowledge before issuing their own acknowledgements (where possible).

The source configuration acknowledgements option will remain in this version, but is deprecated and will be removed in a future version.

See the documentation for end-to-end acknowledgements for more details on the acknowledgement process.