Tag cardinality limit
Limit the cardinality of tags on metrics events as a safeguard against cardinality explosion
Limits the cardinality of tags on metric events, protecting against accidental high cardinality usage that can commonly disrupt the stability of metrics storages.
The default behavior is to drop the tag from incoming metrics when the configured
limit would be exceeded. Note that this is usually only useful when applied to
incremental counter metrics and can have unintended effects when applied to other
metric types. The default action to take can be modified with the
limit_exceeded_action option.
Configuration
Example configurations
{
"transforms": {
"my_transform_id": {
"type": "tag_cardinality_limit",
"inputs": [
"my-source-or-transform-id"
],
"mode": "exact"
}
}
}[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
mode = "exact"
transforms:
my_transform_id:
type: tag_cardinality_limit
inputs:
- my-source-or-transform-id
mode: exact
{
"transforms": {
"my_transform_id": {
"type": "tag_cardinality_limit",
"inputs": [
"my-source-or-transform-id"
],
"cache_size_per_key": 5120,
"limit_exceeded_action": "drop_tag",
"mode": "exact",
"tracking_scope": "global",
"value_limit": 500
}
}
}[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
cache_size_per_key = 5_120
limit_exceeded_action = "drop_tag"
mode = "exact"
tracking_scope = "global"
value_limit = 500
transforms:
my_transform_id:
type: tag_cardinality_limit
inputs:
- my-source-or-transform-id
cache_size_per_key: 5120
limit_exceeded_action: drop_tag
mode: exact
tracking_scope: global
value_limit: 500
cache_size_per_key
optional uintThe size of the cache for detecting duplicate tags, in bytes.
The larger the cache size, the less likely it is to have a false positive, or a case where we allow a new value for tag even after we have reached the configured limits.
5120mode = "probabilistic"graph
optional objectExtra graph configuration
Configure output for component when generated with graph command
graph.edge_attributes
optional objectEdge attributes to add to the edges linked to this component’s node in resulting graph
They are added to the edge as provided
graph.edge_attributes.*
required objectgraph.edge_attributes.*.*
required string literalgraph.node_attributes
optional objectNode attributes to add to this component’s node in resulting graph
They are added to the node as provided
graph.node_attributes.*
required string literalinputs
required [string]A list of upstream source or transform IDs.
Wildcards (*) are supported.
See configuration for more info.
internal_metrics
optional objectinternal_metrics.include_extended_tags
optional boolWhether to include extended tags (metric_name, tag_key) in the tag_value_limit_exceeded_total metric.
This helps identify which metrics and tag keys are hitting cardinality limits, but can significantly
increase metric cardinality. Defaults to false because these tags have potentially unbounded cardinality.
falselimit_exceeded_action
optional string literal enum| Option | Description |
|---|---|
drop_event | Drop the entire event itself. |
drop_tag | Drop the tag(s) that would exceed the configured limit. |
drop_tagmax_tracked_keys
optional uintMaximum number of distinct (metric, tag-key) pairs to track across the entire
transform. When this cap is reached, additional tag keys on new metrics or new
tag keys on existing metrics are not tracked, and tag values for those pairs
pass through unchecked. Users can detect this via the
tag_cardinality_untracked_events_total counter and the
tag_cardinality_tracked_keys gauge.
When unset (default), there is no cap and the transform tracks all pairs it
encounters. In global tracking scope mode, this limit still applies (the
metric key is set to None unless there is a per-metric override).
mode
required string literal enum| Option | Description |
|---|---|
exact | Tracks cardinality exactly. This mode has higher memory requirements than |
probabilistic | Tracks cardinality probabilistically. This mode has lower memory requirements than |
per_metric_limits
optional objectper_metric_limits.*
required objectper_metric_limits.*.cache_size_per_key
optional uintThe size of the cache for detecting duplicate tags, in bytes.
The larger the cache size, the less likely it is to have a false positive, or a case where we allow a new value for tag even after we have reached the configured limits.
mode = "probabilistic"5120per_metric_limits.*.internal_metrics
optional objectWhether to include extended tags (metric_name, tag_key) in the tag_value_limit_exceeded_total metric.
This helps identify which metrics and tag keys are hitting cardinality limits, but can significantly
increase metric cardinality. Defaults to false because these tags have potentially unbounded cardinality.
falseper_metric_limits.*.limit_exceeded_action
optional string literal enum| Option | Description |
|---|---|
drop_event | Drop the entire event itself. |
drop_tag | Drop the tag(s) that would exceed the configured limit. |
drop_tagper_metric_limits.*.mode
required string literal enum| Option | Description |
|---|---|
exact | Tracks cardinality exactly. See Mode::Exact for details. |
excluded | Skip cardinality tracking for this metric. All tag values pass through and nothing is limited. Other fields in this per-metric configuration are ignored when this is selected. |
probabilistic | Tracks cardinality probabilistically. See Mode::Probabilistic for details. |
per_metric_limits.*.namespace
optional string literalper_metric_limits.*.per_tag_limits
optional objectPer-tag-key overrides scoped to this metric. Each entry sets a mode:
mode: limit_override+value_limit: N— track with a per-tag cap.mode: excluded— opt this tag out of tracking entirely.
All other settings (tracking algorithm, limit_exceeded_action, etc.)
are inherited from the enclosing per-metric configuration.
Tags not listed here use the per-metric configuration.
per_metric_limits.*.per_tag_limits.*
required object| Option | Description |
|---|---|
excluded | Opt this tag out of cardinality tracking entirely. All values pass through
without being recorded or checked against any value_limit. |
limit_override | Track this tag with a per-tag value limit. The enclosing per-metric tracking algorithm and all other settings still apply. |
mode = "limit_override"per_metric_limits.*.value_limit
optional uintmode: excluded.500per_tag_limits
optional objectGlobal per-tag-key overrides, applied to every metric that does not match a
per_metric_limits entry. Each entry sets mode: limit_override (with a
per-tag value_limit) or mode: excluded (bypass tracking for that tag).
See the “Per-tag overrides” section under “How it works” for a worked example and the precedence rules.
per_tag_limits.*
required objectper_tag_limits.*.mode
required string literal enum| Option | Description |
|---|---|
excluded | Opt this tag out of cardinality tracking entirely. All values pass through
without being recorded or checked against any value_limit. |
limit_override | Track this tag with a per-tag value limit. The enclosing per-metric tracking algorithm and all other settings still apply. |
per_tag_limits.*.value_limit
required uintmode = "limit_override"tracking_scope
optional string literal enum| Option | Description |
|---|---|
global | All metrics share a single tracking bucket. Tag values pool across metrics
and the global value_limit caps the combined set. |
per_metric | Every distinct metric gets its own tracking bucket, providing tag cardinality limiting for each metric in isolation at the cost of higher memory usage. |
globalInput Types
Outputs
<component_id>
Output Types
Metrics
metric event.Telemetry
Metrics
linkcomponent_discarded_events_total
counterfilter transform, or false if due to an error.component_errors_total
countercomponent_latency_mean_seconds
gaugeThe mean elapsed time, in fractional seconds, that an event spends in a single transform.
This includes both the time spent queued in the transform’s input buffer and the time spent executing the transform itself.
This value is smoothed over time using an exponentially weighted moving average (EWMA).
component_latency_seconds
histogramThe elapsed time, in fractional seconds, that an event spends in a single transform.
This includes both the time spent queued in the transform’s input buffer and the time spent executing the transform itself.
component_received_event_bytes_total
countercomponent_received_events_count
histogramA histogram of the number of events passed in each internal batch in Vector’s internal topology.
Note that this is separate than sink-level batching. It is mostly useful for low level debugging performance issues in Vector due to small internal batches.
component_received_events_total
countercomponent_sent_event_bytes_total
countercomponent_sent_events_total
countertag_value_limit_exceeded_total
countervalue_limit. When internal_metrics.include_extended_tags
is enabled in the tag_cardinality_limit transform, this metric includes
metric_name and tag_key labels. By default, this metric has no labels to
keep cardinality low.internal_metrics.include_extended_tags is enabled.internal_metrics.include_extended_tags is enabled.transform_buffer_max_byte_size
gaugeDeprecated
transform_buffer_max_size_bytes.transform_buffer_max_event_size
gaugeDeprecated
transform_buffer_max_size_events.transform_buffer_max_size_bytes
gaugetransform_buffer_max_size_events
gaugetransform_buffer_utilization
histogramtransform_buffer_utilization_level
gaugetransform_buffer_utilization_mean
gaugeutilization
gaugevalue_limit_reached_total
counterExamples
Drop high-cardinality tag
Given this event...[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_2"}}}]transforms:
my_transform_id:
type: tag_cardinality_limit
inputs:
- my-source-or-transform-id
value_limit: 1
limit_exceeded_action: drop_tag
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
value_limit = 1
limit_exceeded_action = "drop_tag"
{
"transforms": {
"my_transform_id": {
"type": "tag_cardinality_limit",
"inputs": [
"my-source-or-transform-id"
],
"value_limit": 1,
"limit_exceeded_action": "drop_tag"
}
}
}[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{}}}]How it works
Intended Usage
request_id
tag. When this is happens, it is recommended to fix the upstream error as soon
as possible. This is because Vector’s cardinality cache is held in memory and it
will be erased when Vector is restarted. This will cause new tag values to pass
through until the cardinality limit is reached again. For normal usage this
should not be a common problem since Vector processes are normally long-lived.Failed Parsing
This transform stores in memory a copy of the key for every tag on every metric
event seen by this transform. In mode exact, a copy of every distinct
value for each key is also kept in memory, until value_limit distinct values
have been seen for a given key, at which point new values for that key will be
rejected. So to estimate the memory usage of this transform in mode exact
you can use the following formula:
(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
your metrics * `value_limit` * average length of the values of tags for your
metrics)
In mode probabilistic, rather than storing all values seen for each key, each
distinct key has a bloom filter which can probabilistically determine whether
a given value has been seen for that key. The formula for estimating memory
usage in mode probabilistic is:
(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
-your metrics * `cache_size_per_key`)
The cache_size_per_key option controls the size of the bloom filter used
for storing the set of acceptable values for any single key. The larger the
bloom filter the lower the false positive rate, which in our case means the less
likely we are to allow a new tag value that would otherwise violate a
configured limit. If you want to know the exact false positive rate for a given
cache_size_per_key and value_limit, there are many free on-line bloom filter
calculators that can answer this. The formula is generally presented in terms of
’n’, ‘p’, ‘k’, and ’m’ where ’n’ is the number of items in the filter
(value_limit in our case), ‘p’ is the probability of false positives (what we
want to solve for), ‘k’ is the number of hash functions used internally, and ’m’
is the number of bits in the bloom filter. You should be able to provide values
for just ’n’ and ’m’ and get back the value for ‘p’ with an optimal ‘k’ selected
for you. Remember when converting from value_limit to the ’m’ value to plug
into the calculator that value_limit is in bytes, and ’m’ is often presented
in bits (1/8 of a byte).
Per-tag overrides
per_tag_limits lets you override the cardinality settings for individual
tag keys instead of changing the metric-level value_limit. It is supported
at two scopes — the top level (applies to every metric that does not match a
per_metric_limits entry) and inside a per_metric_limits.<name> block
(applies only to that metric).
Each entry uses one of two mode values:
mode: limit_override— track the tag with its ownvalue_limit, independent of the surrounding metric’svalue_limit.mode: excluded— bypass cardinality tracking for this tag entirely. Values pass through unchanged on every event, are not counted against anyvalue_limit, and are never added to the cache.
type: tag_cardinality_limit
value_limit: 500
mode: exact
# Applies to every metric that does NOT match a per_metric_limits entry below.
per_tag_limits:
kube_pod_name:
# High cardinality is intentional for this tag — never track it.
mode: excluded
request_id:
# Tighten the cap for this tag without lowering the metric-level limit.
mode: limit_override
value_limit: 50
per_metric_limits:
http_requests_total:
value_limit: 1000
mode: exact
# This metric has its own per-tag rules. The top-level per_tag_limits
# above is IGNORED for http_requests_total — `kube_pod_name` on this
# metric is therefore tracked against value_limit=1000.
per_tag_limits:
trace_id:
mode: excluded
Precedence is “nearest wins”:
- If the metric matches a
per_metric_limitsentry, only that entry’sper_tag_limitsis consulted; the top-levelper_tag_limitsis ignored for that metric. (This mirrors how a per-metricvalue_limitshadows the globalvalue_limit.) - Otherwise, the top-level
per_tag_limitsis consulted. - Tags not listed in the applicable
per_tag_limitsfall back to the surrounding metric’svalue_limit(per-metric, or global).