Vector v0.19.0 release notes

The Vector team is pleased to announce version 0.19.0!

In addition to the below features, enhancements, and fixes, we’ve been hard at work improving Vector’s performance and were able to move the needle 10-100% for most configurations in our soak test framework from the last release, v0.18.

Be sure to check out the upgrade guide for breaking changes in this release.

Upgrading Vector

When upgrading, we recommend stepping through minor versions as these can each contain breaking changes while Vector is pre-1.0. These breaking changes are noted in their respective upgrade guides.

Highlights

0.19 Upgrade Guide

December 28, 2021

001wwang

type: breaking change

Splunk HEC Improvements

December 15, 2021

barieom

type: enhancement

Improved concurrency model

December 14, 2021

barieom

type: announcement

Known issues

A regression was introduced that changed the name of the data directory for sinks using a disk buffer. This means, when upgrading from v0.18.0, if there was any data in existing disk buffers, it will not be sent. Vector’s starting with clean or empty disk buffers are unaffected. Fixed in v0.19.1. See #10430 for more details.
Presence of framing.character_delimited.delimiter in configs causes Vector to fail to start with invalid type: string. Fixed in v0.19.1.
When using decoding.codec on sources, invalid data will cause the source to cease processing. Fixed in v0.19.2.
encoding.only_fields failed to deserialize correctly for sinks that used fixed encodings (i.e. those that don’t have encoding.codec). Fixed in v0.19.2.
Buffers using when_full of block were incorrectly counting buffer_events_total by including discarded events. Fixed in v0.19.2.
Transforms neglect to tag logs and metrics with their component span tags (like component_id). Fixed in v0.19.2.

Vector Changelog

6 new features

The splunk_hec source and sink have added support for the acknowledgement part of Splunk’s HTTP Event Collector protocol. This improves delivery guarantees for data from Splunk clients and when sending events to Splunk. See the highlight article for more details.
The datadog_agent source is now able to accept metrics from the Datadog Agent; however, some changes are pending in the Datadog Agent to be able to send metrics to Vector. We expect this to be released in version 6.33 / 7.33 of the Datadog Agent.
The humio_logs sink now transmits sub-millisecond timestamps to Humio.
The splunk_hec source and sink have added support for passing the channel token events were sent with from the source to the sink. This makes it easier to use Vector in-between a Splunk sender and receiver to transform the data. See the highlight article for more details.
A new config option has been added to the elasticsearch sink to allow suppressing the type field from being sent by Vector. This field is deprecated in Elasticsearch v7 and will be removed in v8.
A new connection_limit option has been added to TCP-based sources like socket and syslog to limit the number of allowed TCP connections. This can be useful to limit resource utilization by Vector.

10 enhancements

Added end-to-end acknowledgement for the aws_s3 source.
VRL now allows for writing multi-line string literals by ending the line with a backslash (\).
Example:
```
.thing = "foo \
bar"

assert(.thing == "foo bar")
```
A couple of enhancements have been made to the influxdb_logs sink to modify how Vector encodes events.
A measurement config field was added to allow overriding the measurement name (previously hardcoded to vector). Along with this the namespace option was made optional as measurement can be used to set the full name directly.
For example:
```
[sinks.log_to_influxdb]
type = "influxdb_logs"
measurement = "vector-logs"
endpoint = "http://localhost:9999"
```
Now outputs events like:
```
vector-logs,metric_type=logs,host=example.com message="hello world",size=10 {timestamp}
```
A metric_type config field was added to allow customizing the metric type (previously hardcoded to log).
For example:
```
[sinks.log_to_influxdb]
type = "influxdb_logs"
measurement_type = "foo"
endpoint = "http://localhost:9999"
```
Now outputs events like:
```
ns.vector,metric_type=foo,host=example.com message="hello world",size=10 {timestamp}
```
Thanks to juvenn for contributing this change!
The kubernetes_logs source has been updated to read older pod logs first. This should result in better behavior with Vector releasing file handles for rotated pod files more quickly.
Vector’s support for environment variable expansion in configuration files now allows .s in the variable names as these commonly appear in environment variables set by Java properties files.
VRL has added new functions for interacting with event metadata.:
- get_metadata_field("key")
- set_metadata_field("key", "value")
- remove_metadata_field("key")
Right now, the only event metadata that is accessible is Datadog API keys (datadog_api_key) or Splunk HEC channel tokens (splunk_hec_token) that are associated by the source, but we expect metadata use-cases to grow.
This can be used with, for example, CSV enrichment tables to lookup the datadog_api_key to use with events based on other metadata.
The statsd sink now compresses histograms to result in smaller payloads without data loss.
Vector’s CPU utilization has improved by running eligible transforms on multiple cores when possible. Previously, a transform could be a significant bottleneck since only one copy of it was ran which would result Vector under-utilizing available CPU resources. See the highlight article for more details.
We have improved Vector’s performance for most use-cases by re-introducing the jemalloc memory allocator as the allocator for *nix platforms. We continue to evaluate other allocators to see if they are a better fit for Vector’s allocation patterns.
Fix metric emission for the blackhole sink when a rate limit lower than 1024 was used.

7 bug fixes

The headers_key config option for the kafka source was restored. This was accidentally renamed to headers_field in v0.18.0. For compatibility with v0.18.0, headers_field will also be accepted.
The loki sink now accepts any 200-level HTTP response from servers as success. This was added for compatibility with other Loki-compatible APIs like cLoki which didn’t respond with the expected 204 response code.
The host and pid fields are correctly added to all internal logs now. Previously they were only added to start-up logs, but not logs while Vector was running.
Fix a panic that could occur during when reloading Vector config that requires shutting down and recreating a sink.
Vector, when using --config-dir, no longer tries to load unknown file extensions. This was a regression in v0.18.
The max_length option available on some decoding framers for sources previously caused Vector to stop decoding a given input stream (like a TCP connection) when a frame that was too big was encountered. It now correctly just discards that frame and continues.
Previously the framing.character_delimited.delimiter option available on some sources allowed for characters greater than a byte, but the implementation assumed the delimiter was only one byte. Vector now correctly errors if a delimiter that is greater than a byte is used. Only byte delimiters are allowed for efficiency in scanning.

What’s next

Faster disk buffers

We are in the process of replacing our current disk buffer implementation, which leverages LevelDB in a way that doesn’t quite match common LevelDB use-cases, with a custom implementation specific to Vector’s needs. The end result is faster disk buffers.

Component metric standardization

We are in the process of ensuring that all Vector components report a consistent set of metrics to make it easier to monitor the performance of Vector. These metrics are outlined in this new instrumentation specification).

Vector v0.19.0 release notes

Highlights

Known issues

Vector Changelog

6 new features

10 enhancements

7 bug fixes

What’s next

Download Version 0.19.0