Vector v0.47.0 release notes

The Vector team is excited to announce version 0.47.0!

Release highlights:

  • The opentelemetry source now supports metrics ingestion.
  • A new window transform has been introduced which enables log noise reduction by filtering out events when the system is in a healthy state.
  • A new mqtt source is now available, enabling ingestion from MQTT brokers.
  • The datadog_logs sink now supports a new conforms_as_agent option to format logs like the Datadog Agent, ensuring compatibility with reserved fields.
Upgrading Vector
When upgrading, we recommend stepping through minor versions as these can each contain breaking changes while Vector is pre-1.0. These breaking changes are noted in their respective upgrade guides.

Vector Changelog

13 new features

  • The opentelemetry source now supports metrics ingestion.
    Thanks to cmcmacs for contributing this change!
  • Add a new window transform, a variant of ring buffer or backtrace logging implemented as a sliding window. Allows for reduction of log volume by filtering out logs when the system is healthy, but preserving detailed logs when they are most relevant.
    Thanks to ilinas for contributing this change!
  • Add a new mqtt source enabling Vector to receive logs from a MQTT broker.
    Thanks to mladedav, pront, StormStake for contributing this change!
  • Add support for rendering to Mermaid format in vector graph
    Thanks to Firehed for contributing this change!
  • Allow users to specify AWS authentication and the AWS service name for HTTP sinks to support AWS API endpoints that require SigV4.
    Thanks to johannesfloriangeiger for contributing this change!
  • Add support for fluentd forwarding over a Unix socket
    Thanks to tustvold for contributing this change!
  • Add ACK support to message buffering feature of websocket_server sink, allowing this component to cache latest received messages per client.
    Thanks to esensar, Quad9DNS for contributing this change!
  • Introduce a configuration option in the StatsD source: convert_to of type ConversionUnit. By default, timing values in milliseconds (ms) are converted to seconds (s). Users can set convert_to to milliseconds to preserve the original millisecond values.
    Thanks to devkoriel for contributing this change!
  • The address field is now available within VRL scripts when using the auth.strategy.custom authentication method.
    Thanks to esensar, Quad9DNS for contributing this change!
  • Add support for the Sec-WebSocket-Protocol header in the websocket_server sink to better accommodate clients that require it.
    Thanks to esensar, Quad9DNS for contributing this change!
  • The redis sink now supports any input event type that the configured encoding supports. It previously only supported log events.
    Thanks to ynachi for contributing this change!
  • Add timeout config option to the healthcheck sink configuration. Previously it was hardcoded to 10 seconds across all components, but now it can be configured per component.
    Thanks to esensar, Quad9DNS for contributing this change!
  • Add wildcard_matching global config option to set wildcard matching mode for inputs. Relaxed mode allows configurations with wildcards that do not match any inputs to be accepted without causing an error.

    Example config:

    wildcard_matching: relaxed
    
    sources:
      stdin:
        type: stdin
    
    # note - no transforms
    
    sinks:
      stdout:
        type: console
        encoding:
          codec: json
        inputs:
          - "runtime-added-transform-*"
    

    Thanks to simplepad for contributing this change!

6 enhancements

  • Zlib compression and decompression are now more efficient by using zlib-rs.
    Thanks to JakubOnderka for contributing this change!
  • Reduce unnecessary buffer reallocation when using framing.method = length_delimited in sinks for significantly improved performance with large (more than 10MB) batches.
    Thanks to Ilmarii for contributing this change!
  • The enrichment functions now support bounded date range filtering using optional from and to parameters. There are no changes to the function signatures.
    Thanks to nzxwang for contributing this change!
  • Files specified in the file and files fields of remap transforms are now watched when --watch-config is enabled. Changes to these files automatically trigger a configuration reload, so there’s no need to restart Vector.
    Thanks to nekorro for contributing this change!
  • The amqp sink now supports setting the priority for messages. The value can be templated to an integer 0-255 (inclusive).
    Thanks to aramperes for contributing this change!
  • Add deferred.max_age_secs and deferred.queue_url options to the aws_s3 and aws_sqs sinks, to automatically route older event notifications to a separate queue, allowing prioritized processing of recent files.
    Thanks to akutta for contributing this change!

10 bug fixes

  • Fix a Vector crash that occurred when the internal metrics generated too many groups by increasing groups max limit from 128 to 256.
    Thanks to triggerhappy17 for contributing this change!
  • Fix file source bug where known small files were not deleted after the specified remove_after_secs.
    Thanks to linw1995 for contributing this change!
  • Fix an AWS authentication bug where region was missing from the STS authentication endpoint.
    Thanks to cahartma for contributing this change!
  • Increase the max event size for aws_cloudwatch_logs sink to ~1MB.
    Thanks to cahartma for contributing this change!
  • Fix a kubernetes source bug where use_apiserver_cache=true but there is no resourceVersion=0 parameter in list request. Per this issue, when resourceVersion =0 and !page_size.is_none inListParams, the parameter resourceVersion=0 will be ignored by kube-rs sdk. If no parameter resourceVersion passed to the apiserver, the apiserver will list pods from ETCD instead of in memory cache.
    Thanks to xiaozongyang for contributing this change!
  • Fix a bug that allows DNS records with an IPv6 prefix length greater than 128 to be transmitted; invalid prefixes are now rejected during parsing.
    Thanks to wooffie for contributing this change!
  • Add checks to prevent invalid timestamps operations during DNS tap parsing; such operations are now validated to ensure correctness.
    Thanks to wooffie for contributing this change!
  • Add an option in the datadog_logs sink to allow Vector to mutate the record to conform to the protocol used by the Datadog Agent itself. To enable, use the conforms_as_agent option or have the appropriate agent header (DD-PROTOCOL: agent-json) within the additional HTTP Headers list.

    Any top-level fields that use Datadog-reserved keywords are moved into a new object named message. If message doesn’t exist, it is created first. For example:

    {
      "key1": "value1",
      "key2": { "key2-1" : "value2" },
      "message" : "Hello world",
      ... rest of reserved fields
    }
    

    will be modified to:

    {
      "message" : {
        "message" : "Hello world",
        "key1": "value1",
        "key2": { "key2-1" : "value2" }
      },
      ... rest of reserved fields
    }
    

    Thanks to graphcareful for contributing this change!
  • Fix a bug in the datadog_logs sink where the content of the log message is dropped when logs namespacing is enabled.
    Thanks to graphcareful for contributing this change!
  • Fix misleading error message for invalid field name in gelf encoder.
    Thanks to mprasil for contributing this change!

1 chore

  • Add a new extra_headers option to greptimedb_logs sink configuration to set additional headers for outgoing requests.

    Change greptimedb_logs sink default content type to application/x-ndjson to match the default content type of greptimedb sink. If you use the greptimedb version v0.12 or earlier, you need to set the content type to application/json in the sink configuration.

    Example:

    sinks:
      greptime_logs:
        type: greptimedb_logs
        inputs: ["my_source_id"]
        endpoint: "http://localhost:4000"
        table: "demo_logs"
        dbname: "public"
        extra_headers:
          x-source: vector
    
    [sinks.greptime_logs]
    type = "greptimedb_logs"
    inputs = ["my_source_id"]
    endpoint = "http://localhost:4000"
    table = "demo_logs"
    dbname = "public"
    
    [sinks.greptime_logs.extra_headers]
    x-source = "vector"
    

    Thanks to greptimedb for contributing this change!

VRL Changelog

VRL was updated to v0.24.0. This includes the following changes:

Enhancements

  • The encode_gzip, decode_gzip, encode_zlib, and decode_zlib methods now use the zlib-rs backend. which is much faster than the previous backend miniz_oxide.

  • The decode_base64, encode_base64, and decode_mime_q functions now use the SIMD backend. which is faster than the previous backend.

Fixes

  • Add BOM stripping logic to the parse_json function.

Download Version 0.47.0