The COSE team is excited to announce version 0.56.0!
databricks_zerobus sink that streams log data to Databricks Unity Catalog tables through the Zerobus ingestion service. The sink supports OAuth 2.0 authentication, automatic schema fetching from Unity Catalog, and protobuf batch encoding.delay transform that delays each event by a fixed duration. Events can also
be delayed based on a condition which includes VRL transforms.retry_strategy configuration
option to control which HTTP response codes are retried. The http sink also includes a new
example showing how to retry only specific transient status codes.vector sink now supports zstd compression in addition to gzip. This provides better
compression ratios and performance for Vector-to-Vector communication.tag_cardinality_limit transform received major enhancements: per-tag cardinality
overrides (per_tag_limits), per-metric tracking isolation (tracking_scope: per_metric),
a global key cap (max_tracked_keys), and the ability to opt entire metrics out of
cardinality tracking.aws_s3 sink is now available out of the box in official release
binaries for all users.prometheus_remote_write, aws_cloudwatch_metrics, statsd, and others.expected_event_count field on test outputs, allowing assertions on the number of events emitted by a transform.greptimedb_metrics and greptimedb_logs sinks now require GreptimeDB v1.x. Users running GreptimeDB v0.x must upgrade their GreptimeDB instance before upgrading Vector.HTTP-based sinks that use the shared retry helpers now support a retry_strategy configuration
option to control which HTTP response codes are retried. The http sink also includes a new
example showing how to retry only specific transient status codes.
expected_event_count field on test outputs, allowing assertions on the number of events emitted by a transform.databricks_zerobus sink that streams log data to Databricks Unity Catalog tables through the Zerobus ingestion service. The sink supports OAuth 2.0 authentication, automatic schema fetching from Unity Catalog, and protobuf batch encoding.delay transform that delays each event by a fixed duration.ratio_field and rate_field options to the sample transform to support dynamic per-event sampling, while requiring static rate or ratio fallback configuration and disallowing ratio_field and rate_field together.HTTP-based sinks using the shared retry logic now classify transport-layer failures with
HttpError::is_retriable: connection and TLS connector issues may be retried, while failures
such as invalid HTTP request construction or an invalid proxy URI are not. Setting
retry_strategy to none disables retries for these transport errors and for request
timeouts, in addition to status-code-based retries.
The vector sink now supports zstd compression in addition to gzip. This provides better
compression ratios and performance for Vector-to-Vector communication.
The compression configuration has been enhanced to support multiple algorithms while maintaining full backward compatibility:
sinks:
my_vector:
type: vector
address: "localhost:6000"
compression: true # Uses gzip (default)
# or
compression: false # No compression
sinks:
my_vector:
type: vector
address: "localhost:6000"
compression: "zstd" # Use zstd compression
# Supported values: "none", "gzip", "zstd"
The Vector source automatically accepts both gzip and zstd compressed data, enabling seamless communication between Vector instances using different compression algorithms.
opentelemetry source’s gRPC OTLP receiver now accepts zstd-compressed
requests in addition to gzip, matching the compression schemes advertised via
the grpc-accept-encoding response header. No configuration change is required;
clients can send OTLP payloads with grpc-encoding: zstd and they will be
transparently decompressed.custom auth strategy for the http_server source now supports event enrichment via metadata
writes. VRL programs can write %field = value during authentication; those values are injected
into every successfully authenticated event. The event body (.field) remains read-only. Existing
custom programs that do not write metadata are unaffected.serde_json to 1.0.149 and serde_with to 3.18.0. serde_json switched its float-to-string formatter from Ryū to Żmij in 1.0.147, so floats serialized via the native_json codec may render with slightly different textual form (for example 1e+16 instead of 1e16). The change is purely cosmetic: parsed f32/f64 values round-trip identically, and Vector-to-Vector communication between old and new versions is unaffected.The splunk_hec source now accepts optional per-endpoint codec configuration via event: { framing, decoding } and raw: { framing, decoding }. When decoding is set on an endpoint, Vector applies a second decoding pass after the HEC envelope is parsed: on /services/collector/event the envelope’s event field is fed through the codec, and on /services/collector/raw the request body is fed through the codec directly. A single payload can fan out to multiple events.
For example, to decode JSON payloads in /event requests while splitting /raw bodies on newlines:
sources:
hec:
type: splunk_hec
address: 0.0.0.0:8088
event:
decoding:
codec: json
raw:
framing:
method: newline_delimited
decoding:
codec: bytes
tag_cardinality_limit transform now accepts a top-level per_tag_limits map,
mirroring the per-metric one: mode: limit_override to set a per-tag cap, or
mode: excluded to bypass cardinality tracking for that tag on every metric without a
per_metric_limits entry.tag_cardinality_limit transform when running in exact mode by allocating less unused memory on initialization.The tag_cardinality_limit transform gained two new configuration capabilities:
per_tag_limits): configure cardinality limits per tag key within a metric, or exclude individual tags from tracking.mode: excluded in per_metric_limits.The tag_cardinality_limit transform gained two new settings:
tracking_scope: isolate tag tracking per metric (per_metric) instead of sharing a single bucket across all metrics (global, the default).max_tracked_keys: cap the total number of tag keys tracked to bound memory usage.The default /etc/vector/vector.yaml config file is no longer installed by the Debian, RPM, Alpine, and distroless-static Docker packages. The previous default ran a demo_logs source and printed synthesized syslog lines to stdout, which then surfaced in journald or /var/log/ on hosts running Vector as a service and was a common source of confusion.
New installs will now have no active config on disk. Provide your own configuration at /etc/vector/vector.yaml (or pass --config <path>) before starting Vector. A reference example is shipped at /usr/share/vector/examples/vector.yaml, and more sample configs remain at /etc/vector/examples/.
Existing installs are unaffected on upgrade: package managers preserve the on-disk /etc/vector/vector.yaml if you already had one.
Fixed a CPU regression introduced in 0.50.0 affecting all sinks that use metric normalization such as prometheus_remote_write, aws_cloudwatch_metrics, statsd, and others.
The only exception is the incremental_to_absolute transform when max_bytes or max_events are configured, where the overhead is expected and necessary for eviction to work correctly.
grpc-encoding (e.g. identity or a
missing header). Previously, such malformed frames were silently decoded as
gzip, which could mask client/server compression-negotiation bugs.windows_event_log source no longer freezes after periods of inactivity.ComponentEventsDropped for every encode failure path. Previously some build_record_batch failures (notably type mismatches) dropped events silently. A new EncoderRecordBatchError internal event also reports component_errors_total with error_code="arrow_json_decode" or "arrow_record_batch_creation" at stage="sending" for granular alerting.splunk_hec source emits on missing or invalid auth
headers now specifies “authentication_failed” as the error_type.get_env_var, in the standalone VRL CLI and web playground by default.aws_s3 sink (batch_encoding) now works out of the box in the official release binaries. Previously it required compiling Vector from source with the codecs-parquet feature.windows_event_log source now adds standard source metadata, including source_type, to emitted log events.file source where checkpoints recording the last-read file position were not always fully written before Vector shut down. On the next startup, the file source could start reading from an earlier position, causing events to be re-processed.The aggregate transform now correctly passes through or ignores metrics whose kind is not supported
by the configured mode. Prior to this change, these metrics would be silently dropped, contrary to
the officially documented behavior. For example, absolute metrics flowing through a sum-mode aggregate
transform are now forwarded to the next step in the pipeline unchanged rather than being dropped:
{kind: incremental, type: counter, name: "http.requests", value: 10} → summed into aggregate
{kind: absolute, type: gauge, name: "cpu.usage", value: 0.83} → previously dropped, now passes through unchanged
{kind: incremental, type: counter, name: "http.requests", value: 5} → summed into aggregate
If you want to preserve the previous drop behavior, add a filter transform before the aggregate transform to discard the unwanted metric kind.
aws_s3 and clickhouse sinks now correctly advertise only the batch_encoding.codec values they actually support: parquet for aws_s3 and arrow_stream for clickhouse. Previously, the documentation and configuration schema listed both codecs for both sinks, even though picking the wrong one produced a startup error.demo_logs source has changed: the
pool of fake usernames and the pool of fake domain TLDs are now both
defined inside Vector rather than pulled from an external crate. The
line formats (apache_common, apache_error, json, syslog,
bsd_syslog) are unchanged. If any of your tests or downstream
pipelines assert on specific generated usernames or TLDs, update those expectations.Fixed a bug in the topology builder causing component metrics registered at build time to miss the component tags if the component build function awaits non-trivially.
This notably affected sinks using a disk buffer, and sources or sinks performing IO work in the build function.
mqtt source where user-provided TLS client certificates (crt_file / key_file) were being silently ignored, breaking mTLS connections to strict brokers like AWS IoT Core.DD-API-KEY, X-Honeycomb-Team, x-api-key, Api-Key) in debug-level HTTP request and response logs, alongside the existing standard headers (Authorization, Proxy-Authorization, Proxy-Authenticate, WWW-Authenticate, Cookie, Set-Cookie, Cookie2).TCP-based sources that emit acknowledgements (fluent, logstash) no longer log a spurious Error writing acknowledgement, dropping connection. at ERROR level when the ack write fails because the peer cleanly closed its TLS session (for example, during a rolling pod restart). These graceful shutdowns now log at WARN and no longer increment component_errors_total{error_code="ack_failed", ...}, preventing operator dashboards/alerts from firing on routine peer disconnects. Genuine ack write failures are still logged at ERROR and continue to increment component_errors_total.
The connection_shutdown_total{mode="tcp"} counter is now incremented exactly once per accepted source connection when its per-connection task exits, pairing with ConnectionOpen — regardless of cause (TLS handshake failure, shutdown signal during handshake, graceful peer EOF, decoder failure, downstream closed, ack write failure, tripwire, max connection duration). Previously it was not emitted by TCP sources at all.
greptimedb_metrics and greptimedb_logs sinks now require GreptimeDB v1.x. Users running GreptimeDB v0.x must upgrade their GreptimeDB instance before upgrading Vector.parse_regex changes from 0.33.0 which introduced a performance regression in multi-threaded scenarios.(https://github.com/vectordotdev/vrl/pull/1789)
\u{HEX} Unicode escape sequences. Any valid Unicode scalar value can be expressed, e.g. "hello\u{1F30E}world". Invalid sequences (empty braces, non-hex digits, surrogate codepoints, or values above U+10FFFF) are reported as a compile-time error.(https://github.com/vectordotdev/vrl/pull/1771)
parse_regex now accepts dynamic regex patterns (variables and runtime expressions), consistent with parse_regex_all. When the pattern is a literal, return type information remains precise based on named capture groups.(https://github.com/vectordotdev/vrl/pull/1774)
parse_user_agent function(https://github.com/vectordotdev/vrl/pull/1776)
bool fields (using the same parsing as to_bool), and integers are accepted for float/double fields. Previously these inputs failed encoding and required explicit conversion in VRL.(https://github.com/vectordotdev/vrl/pull/1763)
allow_lossy_string_coercion argument to encode_proto. VRL’s protobuf encoding stringifies Boolean, Integer, Float, and Timestamp values when assigned to a protobuf string field as a convenience for callers handling loosely typed input. The protobuf JSON mapping only accepts a JSON string for a string field, so callers who want strict spec-compliant encoding can now pass allow_lossy_string_coercion: false. The default stays true, so today’s behavior is unchanged.(https://github.com/vectordotdev/vrl/pull/1764)
parse_regex/parse_regex_all by pre-computing capture group names and indices at compile time. Users may see anywhere from 4% to 13% speedups in some cases.(https://github.com/vectordotdev/vrl/pull/1773)
parse_regex_all by reusing the compiled regex across invocations.(https://github.com/vectordotdev/vrl/pull/1775)
{
push(.x, 1)
.b = push(.y, 2)
}
now reports both push(.x, 1) (unhandled error) and .b = push(.y, 2) (unhandled fallible assignment) in one go. Previously you’d only see the second one, fix it, recompile, and only then discover the first.
(https://github.com/vectordotdev/vrl/pull/1759)
{
.a = 1
push(.x, 1) # the unhandled error is actually here
.b = 2 # but the compiler used to flag this line
}
The error is now reported on the actual fallible expression, so adding ! or the , err = form fixes it where you’d expect. This also fixes the same shape inside closure bodies, e.g. inside for_each/map_values.
(https://github.com/vectordotdev/vrl/pull/453)
E900) where a variable used before being reassigned (shadowed) was incorrectly flagged as unused at its original assignment.(https://github.com/vectordotdev/vrl/pull/1743)
encode_proto and parse_proto now support proto maps whose keys are integers or booleans, not just strings. Because VRL object keys are always strings, integer and boolean keys are written in their string form:encode_proto({ "by_id": { "42": "alice" } }, "schema.desc", "MyMessage")
Previously parse_proto errored on these maps and encode_proto silently dropped the field. Note that encode_proto will now return an error if a key string can’t be parsed into the schema’s key type (for example, "abc" against a map<int32, ...>).
(https://github.com/vectordotdev/vrl/pull/1762)
SCREAMING_SNAKE in casing functions such as pascalcase, camelcase and others.pascalcase("hello", original_case: "SCREAMING_SNAKE") now compiles properly.
(https://github.com/vectordotdev/vrl/pull/1770)
else keyword (and else if) to appear on a new line after the closing } of an if-block. Previously the trailing newline terminated the if-expression at the parser level, forcing else to share a line with }.authors: pront
Sign up to receive emails on the latest Vector content and new releases
Thank you for joining our Updates Newsletter