Tag cardinality limit transform

Limit the cardinality of tags on metrics events as a safeguard against cardinality explosion

Limits the cardinality of tags on metric events, protecting against accidental high cardinality usage that can commonly disrupt the stability of metrics storages.

The default behavior is to drop the tag from incoming metrics when the configured limit would be exceeded. Note that this is usually only useful when applied to incremental counter metrics and can have unintended effects when applied to other metric types. The default action to take can be modified with the limit_exceeded_action option.

Configuration

Example configurations

{
  "transforms": {
    "my_transform_id": {
      "type": "tag_cardinality_limit",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "mode": "exact"
    }
  }
}

[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
mode = "exact"

transforms:
  my_transform_id:
    type: tag_cardinality_limit
    inputs:
      - my-source-or-transform-id
    mode: exact

{
  "transforms": {
    "my_transform_id": {
      "type": "tag_cardinality_limit",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "cache_size_per_key": 5120,
      "limit_exceeded_action": "drop_tag",
      "mode": "exact",
      "tracking_scope": "global",
      "value_limit": 500
    }
  }
}

[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
cache_size_per_key = 5_120
limit_exceeded_action = "drop_tag"
mode = "exact"
tracking_scope = "global"
value_limit = 500

transforms:
  my_transform_id:
    type: tag_cardinality_limit
    inputs:
      - my-source-or-transform-id
    cache_size_per_key: 5120
    limit_exceeded_action: drop_tag
    mode: exact
    tracking_scope: global
    value_limit: 500

cache_size_per_key

optional uint

The size of the cache for detecting duplicate tags, in bytes.

The larger the cache size, the less likely it is to have a false positive, or a case where we allow a new value for tag even after we have reached the configured limits.

default: 5120

Relevant when: mode = "probabilistic"

graph

optional object

Extra graph configuration

Configure output for component when generated with graph command

graph.edge_attributes

optional object

Edge attributes to add to the edges linked to this component’s node in resulting graph

They are added to the edge as provided

graph.edge_attributes.*

required object

A collection of graph edge attributes in graphviz DOT language, related to a single input component.

graph.edge_attributes..

required string literal

A single graph edge attribute in graphviz DOT language.

Examples

{
  "color": "red",
  "label": "Example Edge",
  "width": "5.0"
}

Examples

{
  "example_input": {
    "color": "red",
    "label": "Example Edge",
    "width": "5.0"
  }
}

graph.node_attributes

optional object

Node attributes to add to this component’s node in resulting graph

They are added to the node as provided

graph.node_attributes.*

required string literal

A single graph node attribute in graphviz DOT language.

Examples

{
  "color": "red",
  "name": "Example Node",
  "width": "5.0"
}

inputs

required [string]

A list of upstream source or transform IDs.

Wildcards (*) are supported.

See configuration for more info.

Array string literal

Examples

[
  "my-source-or-transform-id",
  "prefix-*"
]

internal_metrics

optional object

Configuration of internal metrics for the TagCardinalityLimit transform.

internal_metrics.include_extended_tags

optional bool

Whether to include extended tags (metric_name, tag_key) in the tag_value_limit_exceeded_total metric.

This helps identify which metrics and tag keys are hitting cardinality limits, but can significantly increase metric cardinality. Defaults to false because these tags have potentially unbounded cardinality.

default: false

limit_exceeded_action

optional string literal enum

Possible actions to take when an event arrives that would exceed the cardinality limit for one or more of its tags.

Enum options string literal

Option	Description
`drop_event`	Drop the entire event itself.
`drop_tag`	Drop the tag(s) that would exceed the configured limit.

default: drop_tag

max_tracked_keys

optional uint

Maximum number of distinct (metric, tag-key) pairs to track across the entire transform. When this cap is reached, additional tag keys on new metrics or new tag keys on existing metrics are not tracked, and tag values for those pairs pass through unchecked. Users can detect this via the tag_cardinality_untracked_events_total counter and the tag_cardinality_tracked_keys gauge.

When unset (default), there is no cap and the transform tracks all pairs it encounters. In global tracking scope mode, this limit still applies (the metric key is set to None unless there is a per-metric override).

mode

required string literal enum

Controls the approach taken for tracking tag cardinality.

Examples

"exact"

"probabilistic"

Enum options string literal

Option Description

Option	Description
`exact`	Tracks cardinality exactly. This mode has higher memory requirements than `probabilistic`, but never falsely outputs metrics with new tags after the limit has been hit.
`probabilistic`	Tracks cardinality probabilistically. This mode has lower memory requirements than `exact`, but may occasionally allow metric events to pass through the transform even when they contain new tags that exceed the configured limit. The rate at which this happens can be controlled by changing the value of `cache_size_per_key`.

exact

Tracks cardinality exactly.

This mode has higher memory requirements than probabilistic, but never falsely outputs metrics with new tags after the limit has been hit.

probabilistic

Tracks cardinality probabilistically.

This mode has lower memory requirements than exact, but may occasionally allow metric events to pass through the transform even when they contain new tags that exceed the configured limit. The rate at which this happens can be controlled by changing the value of cache_size_per_key.

per_metric_limits

optional object

Tag cardinality limits configuration per metric name.

per_metric_limits.*

required object

An individual metric configuration.

per_metric_limits.*.cache_size_per_key

optional uint

The size of the cache for detecting duplicate tags, in bytes.

The larger the cache size, the less likely it is to have a false positive, or a case where we allow a new value for tag even after we have reached the configured limits.

Relevant when: mode = "probabilistic"

default: 5120

per_metric_limits.*.internal_metrics

optional object

Configuration of internal metrics for the TagCardinalityLimit transform.

per_metric_limits.*.internal_metrics.include_extended_tags

optional bool

Whether to include extended tags (metric_name, tag_key) in the tag_value_limit_exceeded_total metric.

default: false

per_metric_limits.*.limit_exceeded_action

optional string literal enum

Possible actions to take when an event arrives that would exceed the cardinality limit for one or more of its tags.

Enum options

Option	Description
`drop_event`	Drop the entire event itself.
`drop_tag`	Drop the tag(s) that would exceed the configured limit.

default: drop_tag

per_metric_limits.*.mode

required string literal enum

Controls the approach taken for tracking tag cardinality.

Enum options

Option	Description
`exact`	Tracks cardinality exactly. See `Mode::Exact` for details.
`excluded`	Skip cardinality tracking for this metric. All tag values pass through and nothing is limited. Other fields in this per-metric configuration are ignored when this is selected.
`probabilistic`	Tracks cardinality probabilistically. See `Mode::Probabilistic` for details.

Examples

"exact"

"excluded"

"probabilistic"

per_metric_limits.*.namespace

optional string literal

Namespace of the metric this configuration refers to.

per_metric_limits.*.per_tag_limits

optional object

Per-tag-key overrides scoped to this metric. Each entry sets a mode:

mode: limit_override + value_limit: N — track with a per-tag cap.
mode: excluded — opt this tag out of tracking entirely.

All other settings (tracking algorithm, limit_exceeded_action, etc.) are inherited from the enclosing per-metric configuration. Tags not listed here use the per-metric configuration.

per_metric_limits..per_tag_limits.

required object

An individual tag configuration.

per_metric_limits.*.per_tag_limits.*.mode required string literal enum

Controls how this tag key is handled.

Enum options

Option	Description
`excluded`	Opt this tag out of cardinality tracking entirely. All values pass through without being recorded or checked against any `value_limit`.
`limit_override`	Track this tag with a per-tag value limit. The enclosing per-metric tracking algorithm and all other settings still apply.

Examples

"excluded"

"limit_override"

per_metric_limits.*.per_tag_limits.*.value_limit required uint

Maximum number of distinct values to accept for this tag key.

Relevant when: mode = "limit_override"

per_metric_limits.*.value_limit

optional uint

How many distinct values to accept for any given key. Ignored when mode: excluded.

default: 500

per_tag_limits

optional object

Global per-tag-key overrides, applied to every metric that does not match a per_metric_limits entry. Each entry sets mode: limit_override (with a per-tag value_limit) or mode: excluded (bypass tracking for that tag).

See the “Per-tag overrides” section under “How it works” for a worked example and the precedence rules.

per_tag_limits.*

required object

An individual tag configuration.

per_tag_limits.*.mode

required string literal enum

Controls how this tag key is handled.

Enum options

Option	Description
`excluded`	Opt this tag out of cardinality tracking entirely. All values pass through without being recorded or checked against any `value_limit`.
`limit_override`	Track this tag with a per-tag value limit. The enclosing per-metric tracking algorithm and all other settings still apply.

Examples

"excluded"

"limit_override"

per_tag_limits.*.value_limit

required uint

Maximum number of distinct values to accept for this tag key.

Relevant when: mode = "limit_override"

tracking_scope

optional string literal enum

Controls how tag tracking state is partitioned across metrics.

Enum options string literal

Option	Description
`global`	All metrics share a single tracking bucket. Tag values pool across metrics and the global `value_limit` caps the combined set.
`per_metric`	Every distinct metric gets its own tracking bucket, providing tag cardinality limiting for each metric in isolation at the cost of higher memory usage.

default: global

value_limit

optional uint

How many distinct values to accept for any given key.

default: 500

Input Types

The following table lists all telemetry data types supported by the component across possible configurations. Be aware that the available data types may differ based on the specified codec configuration.

Metrics

The following metrics are supported:

counter distribution gauge histogram set summary

Outputs

<component_id>

Default output stream of the component. Use this component’s ID as an input to downstream transforms and sinks.

Output Types

Metrics

The modified input metric event.

Telemetry

Metrics

link

component_discarded_events_total

counter

The number of events dropped by this component.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

intentional

True if the events were discarded intentionally, like a filter transform, or false if due to an error.

pid optional

The process ID of the Vector instance.

component_errors_total

counter

The total number of errors encountered by this component.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

error_type

The type of the error

host optional

The hostname of the system Vector is running on.

pid optional

The process ID of the Vector instance.

stage

The stage within the component at which the error occurred.

component_latency_mean_seconds

gauge

The mean elapsed time, in fractional seconds, that an event spends in a single transform.

This includes both the time spent queued in the transform’s input buffer and the time spent executing the transform itself.

This value is smoothed over time using an exponentially weighted moving average (EWMA).

host optional

The hostname of the system Vector is running on.

pid optional

The process ID of the Vector instance.

component_latency_seconds

histogram

The elapsed time, in fractional seconds, that an event spends in a single transform.

This includes both the time spent queued in the transform’s input buffer and the time spent executing the transform itself.

host optional

The hostname of the system Vector is running on.

pid optional

The process ID of the Vector instance.

component_received_event_bytes_total

counter

The number of event bytes accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

container_name optional

The name of the container from which the data originated.

file optional

The file from which the data originated.

host optional

The hostname of the system Vector is running on.

mode optional

The connection mode used by the component.

peer_addr optional

The IP from which the data originated.

peer_path optional

The pathname from which the data originated.

pid optional

The process ID of the Vector instance.

pod_name optional

The name of the pod from which the data originated.

uri optional

The sanitized URI from which the data originated.

component_received_events_count

histogram

A histogram of the number of events passed in each internal batch in Vector’s internal topology.

Note that this is separate than sink-level batching. It is mostly useful for low level debugging performance issues in Vector due to small internal batches.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

container_name optional

The name of the container from which the data originated.

file optional

The file from which the data originated.

host optional

The hostname of the system Vector is running on.

mode optional

The connection mode used by the component.

peer_addr optional

The IP from which the data originated.

peer_path optional

The pathname from which the data originated.

pid optional

The process ID of the Vector instance.

pod_name optional

The name of the pod from which the data originated.

uri optional

The sanitized URI from which the data originated.

component_received_events_total

counter

The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

container_name optional

The name of the container from which the data originated.

file optional

The file from which the data originated.

host optional

The hostname of the system Vector is running on.

mode optional

The connection mode used by the component.

peer_addr optional

The IP from which the data originated.

peer_path optional

The pathname from which the data originated.

pid optional

The process ID of the Vector instance.

pod_name optional

The name of the pod from which the data originated.

uri optional

The sanitized URI from which the data originated.

component_sent_event_bytes_total

counter

The total number of event bytes emitted by this component.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

component_sent_events_total

counter

The total number of events emitted by this component.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

tag_value_limit_exceeded_total

counter

The total number of events discarded because the tag has been rejected after hitting the configured value_limit. When internal_metrics.include_extended_tags is enabled in the tag_cardinality_limit transform, this metric includes metric_name and tag_key labels. By default, this metric has no labels to keep cardinality low.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

metric_name optional

The name of the metric whose tag value limit was exceeded. Only present when internal_metrics.include_extended_tags is enabled.

pid optional

The process ID of the Vector instance.

tag_key optional

The key of the tag whose value limit was exceeded. Only present when internal_metrics.include_extended_tags is enabled.

transform_buffer_max_byte_size

gauge

The maximum number of bytes the buffer that feeds into a transform can hold.

Deprecated

This metric has been deprecated in favor of transform_buffer_max_size_bytes.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

transform_buffer_max_event_size

gauge

The maximum number of events the buffer that feeds into a transform can hold.

Deprecated

This metric has been deprecated in favor of transform_buffer_max_size_events.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

transform_buffer_max_size_bytes

gauge

The maximum number of bytes the buffer that feeds into a transform can hold.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

transform_buffer_max_size_events

gauge

The maximum number of events the buffer that feeds into a transform can hold.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

transform_buffer_utilization

histogram

The utilization level of the buffer that feeds into a transform.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

transform_buffer_utilization_level

gauge

The current utilization level of the buffer that feeds into a transform.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

transform_buffer_utilization_mean

gauge

The mean utilization level of the buffer that feeds into a transform. This value is smoothed over time using an exponentially weighted moving average (EWMA).

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

output optional

The specific output of the component.

pid optional

The process ID of the Vector instance.

utilization

gauge

A ratio from 0 to 1 of the load on a component. A value of 0 would indicate a completely idle component that is simply waiting for input. A value of 1 would indicate a that is never idle. This value is updated every 5 seconds.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

pid optional

The process ID of the Vector instance.

value_limit_reached_total

counter

The total number of times new values for a key have been rejected because the value limit has been reached.

component_id

The Vector component ID.

component_kind

The Vector component kind.

component_type

The Vector component type.

host optional

The hostname of the system Vector is running on.

pid optional

The process ID of the Vector instance.

Examples

Drop high-cardinality tag

Given this event...

[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_2"}}}]

...and this configuration...

transforms:
  my_transform_id:
    type: tag_cardinality_limit
    inputs:
      - my-source-or-transform-id
    value_limit: 1
    limit_exceeded_action: drop_tag

[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
value_limit = 1
limit_exceeded_action = "drop_tag"

{
  "transforms": {
    "my_transform_id": {
      "type": "tag_cardinality_limit",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "value_limit": 1,
      "limit_exceeded_action": "drop_tag"
    }
  }
}

...this Vector event is produced:

[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{}}}]

How it works

Intended Usage

This transform is intended to be used as a protection mechanism to prevent upstream mistakes. Such as a developer accidentally adding a request_id tag. When this is happens, it is recommended to fix the upstream error as soon as possible. This is because Vector’s cardinality cache is held in memory and it will be erased when Vector is restarted. This will cause new tag values to pass through until the cardinality limit is reached again. For normal usage this should not be a common problem since Vector processes are normally long-lived.

Failed Parsing

This transform stores in memory a copy of the key for every tag on every metric event seen by this transform. In mode exact, a copy of every distinct value for each key is also kept in memory, until value_limit distinct values have been seen for a given key, at which point new values for that key will be rejected. So to estimate the memory usage of this transform in mode exact you can use the following formula:

(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
your metrics * `value_limit` * average length of the values of tags for your
metrics)

In mode probabilistic, rather than storing all values seen for each key, each distinct key has a bloom filter which can probabilistically determine whether a given value has been seen for that key. The formula for estimating memory usage in mode probabilistic is:

(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
-your metrics * `cache_size_per_key`)

The cache_size_per_key option controls the size of the bloom filter used for storing the set of acceptable values for any single key. The larger the bloom filter the lower the false positive rate, which in our case means the less likely we are to allow a new tag value that would otherwise violate a configured limit. If you want to know the exact false positive rate for a given cache_size_per_key and value_limit, there are many free on-line bloom filter calculators that can answer this. The formula is generally presented in terms of ’n’, ‘p’, ‘k’, and ’m’ where ’n’ is the number of items in the filter (value_limit in our case), ‘p’ is the probability of false positives (what we want to solve for), ‘k’ is the number of hash functions used internally, and ’m’ is the number of bits in the bloom filter. You should be able to provide values for just ’n’ and ’m’ and get back the value for ‘p’ with an optimal ‘k’ selected for you. Remember when converting from value_limit to the ’m’ value to plug into the calculator that value_limit is in bytes, and ’m’ is often presented in bits (1/8 of a byte).

Per-tag overrides

per_tag_limits lets you override the cardinality settings for individual tag keys instead of changing the metric-level value_limit. It is supported at two scopes — the top level (applies to every metric that does not match a per_metric_limits entry) and inside a per_metric_limits.<name> block (applies only to that metric).

Each entry uses one of two mode values:

mode: limit_override — track the tag with its own value_limit, independent of the surrounding metric’s value_limit.
mode: excluded — bypass cardinality tracking for this tag entirely. Values pass through unchanged on every event, are not counted against any value_limit, and are never added to the cache.

type: tag_cardinality_limit
value_limit: 500
mode: exact

# Applies to every metric that does NOT match a per_metric_limits entry below.
per_tag_limits:
  kube_pod_name:
    # High cardinality is intentional for this tag — never track it.
    mode: excluded
  request_id:
    # Tighten the cap for this tag without lowering the metric-level limit.
    mode: limit_override
    value_limit: 50

per_metric_limits:
  http_requests_total:
    value_limit: 1000
    mode: exact
    # This metric has its own per-tag rules. The top-level per_tag_limits
    # above is IGNORED for http_requests_total — `kube_pod_name` on this
    # metric is therefore tracked against value_limit=1000.
    per_tag_limits:
      trace_id:
        mode: excluded

Precedence is “nearest wins”:

If the metric matches a per_metric_limits entry, only that entry’s per_tag_limits is consulted; the top-level per_tag_limits is ignored for that metric. (This mirrors how a per-metric value_limit shadows the global value_limit.)
Otherwise, the top-level per_tag_limits is consulted.
Tags not listed in the applicable per_tag_limits fall back to the surrounding metric’s value_limit (per-metric, or global).

Restarts

This transform’s cache is held in memory, and therefore, restarting Vector will reset the cache. This means that new values will be passed through until the cardinality limit is reached again. See intended usage for more info.

State

This component is stateful, meaning its behavior changes based on previous inputs (events). State is not preserved across restarts, therefore state-dependent behavior will reset between restarts and depend on the inputs (events) received since the most recent restart.