0.21 Upgrade Guide
An upgrade guide that addresses breaking changes in 0.21.0
Vector’s 0.21.0 release includes breaking changes:
- Syntax changes for non-VRL paths
- GraphQL API outputEventsByComponentIdPatterns subscription argument
patterns
changed tooutputsPatterns
- GraphQL API
EventNotification
type changes - Deprecated GraphQL API subscriptions have been removed
- The
vector vrl
timezone flag-tz
is now-z
- The
vector top
human_metrics flag-h
is now-H
- Remainder operator (%) in VRL is fallible
- AWS SDK Migration
- Route transform metric
event_discarded_total
removed buffer_discarded_events_total
now includes received eventskubernetes_logs
source rewritten to usekube-rs
- Published docker images no longer create implicit volumes
- VRL now includes lexical scoping for blocks
- CLI options: delimiters, wildcards, and boolean options
And deprecations:
- The GraphQL API subscriptions:
receivedEventsTotal
,sentEventsTotal
,sentEventsThroughput
,receivedEventsThroughput
have been deprecated - End-to-end acknowledgement configuration moved to sinks
We cover them below to help you upgrade quickly:
Upgrade guide
Breaking changes
Syntax changes for non-VRL paths
Previously, there were two different ways to describe paths. VRL uses a newer syntax, while everything else in Vector still used an older syntax. This was a constant pain point for users, and we have taken some steps to migrate towards the VRL syntax. This is a breaking change that may require migration.
The old syntax was very lenient in the characters that were allowed in a field name. It also supported single character escapes.
The new syntax only allows A-Z a-z 0-9 _ @
. Any other character will require the field name to be quoted.
Quotes around a field name replace single character escaping. This brings the old syntax in line with the newer (VRL) syntax. Note that
VRL makes a distinction between a field starting with a “.” (event query) and without (variable query). Outside a VRL context,
the “.” is optional and ignored.
Migration will be required for any paths used outside a VRL context. That is any transform (except remap and conditions), templating, and any source or sink referencing field names. There are no changes to the VRL syntax.
Here are some examples that require migrating
old syntax | new syntax | comment |
---|---|---|
foo\.bar.baz | “foo.bar”.baz | The . field separator needs to be escaped if used as part of a field name. The old syntax allowed individual character escaping. The new syntax requires quotes around the field name. |
headers.User-Agent | headers.“User-Agent” | - requires quotes with the new syntax |
foo with spaces | “foo with spaces” | Spaces also need to be quoted |
foo\“bar | “foo\“bar” | Double quotes and backlashes must be escaped inside quotes |
TOML transform example
Old syntax
[transforms.dedupe]
type = "dedupe"
inputs = ["input"]
fields.match = ["message.user-identifier"]
New syntax (the dash requires the field name to be quoted)
[transforms.dedupe]
type = "dedupe"
inputs = ["input"]
fields.match = ["message.\"user-identifier\""]
For more information on the new syntax, you can review the documentation here
GraphQL API outputEventsByComponentIdPatterns
subscription argument patterns
changed to outputsPatterns
To avoid confusion and align with the new inputsPatterns
argument, we’ve
renamed the original patterns
argument to outputsPatterns
. outputsPatterns
allows you to specify patterns that will match against components (sources,
transforms) and display their outflowing events. inputsPatterns
allows you
to specify patterns that match against components (transforms, sinks) and
display their incoming events.
Note that using an input pattern to match a component is effectively a shorthand. It’s the same as using one or more output patterns to match against all the outputs flowing into a component.
Updating your subscriptions is as simple as renaming patterns
to
outputsPatterns
.
- subscription {
- outputEventsByComponentIdPatterns(patterns: [...])
+ subscription {
+ outputEventsByComponentIdPatterns(outputsPatterns: [...])
GraphQL API EventNotification
type changes
As part of adding a new notification (InvalidMatch
) to warn users against
attempting invalid matches, we’ve reshaped the EventNotification
type for
easier querying and future extensibility.
Previously, the EventNotification
type consisted simply of a pattern
and
plain enum describing the notification.
type EventNotification {
pattern: String!
notification: EventNotificationType!
}
While this worked well for simple notifications like Matched
and NotMatched
,
it was awkward to extend to new notifications, like InvalidMatch
, which may
want to include more information than pattern
. Thus, the new
EventNotification
type takes the following form:
type EventNotification {
notification: Notification!
message: String!
}
where Notification
is a union of specific kinds of notifications:
union Notification = Matched | NotMatched | InvalidMatch
message
is a new human-readable description of the notification while
notification
contains additional details specific to the kind of notification.
All the same information is still available, and the following example shows how
you might convert an existing query to the new schema.
subscription {
- outputEventsByComponentIdPatterns(patterns: [...]) {
+ outputEventsByComponentIdPatterns(outputsPatterns: [...]) {
__typename
... on EventNotification {
- pattern
- notification
+ message
+ notification {
+ __typename
+ ... on Matched {
+ pattern
+ }
+ ... on NotMatched {
+ pattern
+ }
+ ... on InvalidMatch {
+ pattern
+ invalidMatches
+ }
+ }
}
}
}
}
Deprecated GraphQL API subscriptions have been removed
The following deprecated subscriptions have been removed in this release. Please use the listed alternatives.
eventsInTotal
: usecomponentReceivedEventsTotals
eventsOutTotal
: usecomponentSentEventsTotals
componentEventsInThroughputs
: usecomponentReceivedEventsThroughputs
componentEventsInTotals
: usecomponentReceivedEventsTotals
componentEventsOutThroughputs
: usecomponentSentEventsThroughputs
componentEventsOutTotals
: usecomponentSentEventsTotals
eventsInThroughput
: usecomponentReceivedEventsThroughputs
eventsOutThroughput
: usecomponentSentEventsThroughputs
The vector vrl
timezone flag -tz
is now -z
We upgraded the Vector CLI to use Clap v3, a popular Rust crate.
A breaking change in Clap v3 is that shortened CLI flags now use the char
type, meaning they are restricted to single characters.
As a consequence, the shortened form of our vector vrl --timezone
flag
(previously --tz
) has been updated to the more succinct -z
.
The vector top
human_metrics short flag -h
is now -H
To avoid clashing and issues with our upgrade to Clap
v3, the short -h
from --help
and -h
from
--human_metrics
in the vector top
command have been disambiguated. The
shortened form for --human_metrics
is now -H
and -h
is reserved for
--help
.
Remainder operator (%) in VRL is fallible
The remainder operator in VRL has become a fallible operation. This is because finding the remainder with a divisor of zero can raise an error that needs to be handled.
Before this VRL would compile:
.remainder = 50 % .value
If .value
was zero, Vector would panic. This can be fixed by handling the error:
.remainder = 50 % .value ?? 0
AWS SDK Migration
We have migrated sources and sinks that utilize AWS to the official AWS SDK (from Rusoto). This comes with some benefits such as support for IMDSv2. This requires us to deprecate some config options.
The new AWS SDK lacks support for certain authentication configuration:
- Vector now only supports IMDSv2 for authentication. If you were previously using IMDSv1, you will need to configure the host to allow IMDSv2. For EKS, this likely means increasing the metadata token response hop limit to 2 (see Amazon EKS now supports EC2 Instance Metadata Service v2 . We are discussing the possibility of re-adding support for IMDSv1 in #12376.
- Vector AWS components’
auth.credential_file
option was removed as the new SDK does not yet support it it. It is still possible to use a credentials file, but it should be placed in the default location (~/.aws/credentials
on Linux, OS X, and Unix;%userprofile%\.aws\credentials
on Microsoft Windows), or the location should be set with an environment variable (AWS_CONFIG_FILE
orAWS_SHARED_CREDENTIALS_FILE
). - Support for
credential_process
in an AWS profile was dropped as it is not yet supported by the new SDK.
Specifying a region is now also required. Make sure a region is specified in either the AWS config file, or the Vector config.
The assume_role
config option was deprecated and moved to auth.assume_role
previously. This deprecated option has
now been removed.
This affects the following components:
- AWS Cloudwatch Metrics Sink
- AWS Cloudwatch Logs Sink
- AWS SQS Source (this was migrated in a previous version)
- AWS SQS Sink
- AWS Kinesis Streams Sink
- AWS Kinesis Firehose Sink
- AWS S3 Sink
- AWS S3 Source
- Datadog Archives Sink (s3 config only)
- Elasticsearch Sink
For more details on configuring auth, you can visit these links:
- https://docs.aws.amazon.com/sdk-for-rust/latest/dg/credentials.html
- https://docs.aws.amazon.com/sdk-for-rust/latest/dg/environment-variables.html
Route transform metric event_discarded_total
removed
Until now, when using the route
transform, if an event didn’t match any configured route, this event would be
discarded and lost for the following transforms and sinks.
A new _unmatched
route has now been introduced and the events are no longer discarded, making the event_discarded_total
metric irrelevant so it has been dropped.
You can still get the total number of events that match no routes via component_events_sent_total
with a tag of output=_unmatched
.
buffer_discarded_events_total
now includes received events
The buffer_discarded_events_total
now includes all events flowing into
a buffer, even those discarded when the buffer is full and drop_newest
is
configured as the when_full
behavior.
This brings the metric inline with the component-level received and discarded
metrics where events are counted as received before being discarded as well as leaving
the door open for additional discard strategies like drop_oldest
where events
would live in the buffer before possibly being discarded.
kubernetes_logs
source rewritten to use kube-rs
The kubernetes_logs
source has had two breaking changes:
- It now requires the
list
verb for Vector’s ClusterRole resource. If you are using the Helm chart, version0.7.0
includes this change. Otherwise, make sure to add it to your manifest. - The
proxy
configuration was dropped. Instead, configure any needed proxy configuration in yourkubeconfig
.
See the highlight for more information.
Published docker images no longer create implicit volumes
Previously /var/lib/vector
was defined as a volume in the Dockerfile
s for
the published Vector images. This led to the creation of the volume each time
you ran a Vector container from these images whether you wanted it or not.
Instead, if you need a volume for the data directory, you should provide one when launching the container.
When migrating from an earlier version to 0.21.0 or later using Docker compose and implicit volumes, you need to use docker inspect to find out which volumes your container is mapped to so that you can map them to the upgraded container as well.
See vectordotdev/vector#11982 for additional rationale.
VRL now includes lexical scoping for blocks
In preparation for VRL iteration support landing in the next release, this release of Vector includes a breaking change to the way variable scoping works.
Specifically, variables defined in nested blocks cannot be accessed by parent blocks.
This is best explained with an example:
# top-level scope
count1 = 1
# nested block
{
count2 = 1
count1 = count1 + 1
# nested block
{
count2 = count2 + 1
count1 = count1 + 1
}
}
count1 # returns ”3”
count2 # returns a compile-time error, because ”count2” was defined in a nested block
CLI options: delimiters, wildcards, and boolean options
When using CLI options that can take multiple values, the provided values must be comma separated. For example:
vector --config foo.toml,bar.toml
Additionally when passing values that contain wildcards (*
), these values
must be quoted. For example:
vector --config "*.toml"
The --watch-config
option previously required a boolean value, which is no
longer needed. For example, in earlier releases:
vector --watch-config=true
This should become:
vector --watch-config
Iteration Sneak Preview
The introduction of lexical scoping is important for when iteration support lands, which (as a sneak preview) allows you to do the following:
data = { ”foo”: 1, ”bar”: 2 }
map(data) -> |key, value| {
new_key = upcase(key)
[new_key, value + 1]
}
data # returns { ”FOO”: 2, ”BAR”: 3 }
new_key # returns a compile-time error, because ”new_key” is a variable scoped
# to the enumeration closure block
Without lexical scoping, it would be ambiguous what new_key
should return in
the last example, but now it’s clear that the variable remains undefined outside
of the closure block.
Deprecations
The GraphQL API subscriptions: receivedEventsTotal
, sentEventsTotal
, sentEventsThroughput
, receivedEventsThroughput
have been deprecated
While these subscriptions were intended to display aggregate metrics across all components, they currently only show a per-component metric and are made redundant by more informative subscriptions that include specific component information. To avoid misuse and confusion, we are deprecating them in favor of the following alternatives.
receivedEventsTotal
: usecomponentReceivedEventsTotals
sentEventsTotal
: usecomponentSentEventsTotals
sentEventsThroughput
: usecomponentSentEventsThroughputs
receivedEventsThroughput
: usecomponentReceivedEventsThroughputs
End-to-end acknowledgement configuration moved to sinks
Currently, end-to-end acknowledgements are opt-in at the source-level
via the acknowledgements.enabled
setting. This made sense initially
since sources are the ones that are acknowledging back to clients, but
makes it difficult to achieve durability. Durability, which is the
primary goal of acknowledgements, is sink-dependent instead of
source-dependent. That is, it is important to assert that all data
going to a system of record is fully acknowledged, for example, for
all the sources that it came from, this guaranteeing delivery to that
destination.
To achieve this, an acknowledgements
option has been added to
sinks. When the configuration is loaded, all sources that are
connected to a sink that has this option enabled will automatically be
configured to wait for sinks to acknowledge before issuing their own
acknowledgements (where possible).
The source configuration acknowledgements
option will remain in this
version, but is deprecated and will be removed in a future version.
See the documentation for end-to-end acknowledgements for more details on the acknowledgement process.