Tag Cardinality Limit Transform
The Vector tag_cardinality_limit
transform
limits the cardinality of tags on metric events, protecting against accidental
high cardinality usage that can commonly disrupt the stability of metrics
storages.
Configuration
- Common
- Advanced
- vector.toml
- vector.yaml
- vector.json
[transforms.my_transform_id]type = "tag_cardinality_limit" # requiredinputs = ["my-source-or-transform-id"] # requiredlimit_exceeded_action = "drop_tag" # optional, defaultmode = "exact" # requiredvalue_limit = 500 # optional, default
- optionaluint
cache_size_per_tag
The size of the cache in bytes to use to detect duplicate tags. The bigger the cache the less likely it is to have a 'false positive' or a case where we allow a new value for tag even after we have reached the configured limits. See Failed Parsing for more info.
- Only relevant when: mode = "probabilistic"
- Default:
5120000
(bytes)
- optionalstring
limit_exceeded_action
Controls what should happen when a metric comes in with a tag that would exceed the configured limit on cardinality.
- Default:
"drop_tag"
- Enum, must be one of:
"drop_tag"
"drop_event"
- View examples
- Default:
- requiredstring
mode
Controls what approach is used internally to keep track of previously seen tags and deterime when a tag on an incoming metric exceeds the limit.
- Enum, must be one of:
"exact"
"probabilistic"
- View examples
- Enum, must be one of:
- optionaluint
value_limit
How many distinct values to accept for any given key. See Failed Parsing for more info.
- Default:
500
- Default:
Telemetry
This component provides the following metrics that can be retrieved through
the internal_metrics
source. See the
metrics section in the
monitoring page for more info.
- counter
tag_value_limit_exceeded_total
The total number of events discarded because the tag has been rejected after hitting the configured
value_limit
. This metric includes the following tags:component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_events_total
The total number of events processed by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.file
- The file that produced the errorinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
value_limit_reached_total
The total number of times new values for a key have been rejected because the value limit has been reached. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_bytes_total
The total number of bytes processed by the component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
Examples
Given the following Vector event:
[{"metric": {"kind": "incremental","name": "logins","counter": {"value": 2.0},"tags": {"user_id": "user_id_1"}}},{"metric": {"kind": "incremental","name": "logins","counter": {"value": 2.0},"tags": {"user_id": "user_id_2"}}}]
And the following configuration:
[transforms.tag_cardinality_limit]type = "tag_cardinality_limit"fields.value_limit = 1fields.limit_exceeded_action = "drop_tag"
The following Vector metric event will be output:
[{"metric": {"kind": "incremental","name": "logins","counter": {"value": 2.0},"tags": {"user_id": "user_id_1"}}},{"metric": {"kind": "incremental","name": "logins","counter": {"value": 2.0},"tags": {}}}]
How It Works
Failed Parsing
This transform stores in memory a copy of the key for every tag on every metric
event seen by this transform. In mode exact
, a copy of every distinct
value for each key is also kept in memory, until value_limit
distinct values
have been seen for a given key, at which point new values for that key will be
rejected. So to estimate the memory usage of this transform in mode exact
you can use the following formula:
(number of distinct field names in the tags for your metrics * average length ofthe field names for the tags) + (number of distinct field names in the tags ofyour metrics * [`value_limit`](#value_limit) * average length of the values of tags for yourmetrics)
In mode probabilistic
, rather than storing all values seen for each key, each
distinct key has a bloom filter which can probabilistically determine whether
a given value has been seen for that key. The formula for estimating memory
usage in mode probabilistic
is:
(number of distinct field names in the tags for your metrics * average length ofthe field names for the tags) + (number of distinct field names in the tags of-your metrics * [`cache_size_per_tag`](#cache_size_per_tag))
The cache_size_per_tag
option controls the size of the bloom filter used
for storing the set of acceptable values for any single key. The larger the
bloom filter the lower the false positive rate, which in our case means the less
likely we are to allow a new tag value that would otherwise violate a
configured limit. If you want to know the exact false positive rate for a given
cache_size_per_tag
and value_limit
, there are many free on-line bloom filter
calculators that can answer this. The formula is generally presented in terms of
'n', 'p', 'k', and 'm' where 'n' is the number of items in the filter
(value_limit
in our case), 'p' is the probability of false positives (what we
want to solve for), 'k' is the number of hash functions used internally, and 'm'
is the number of bits in the bloom filter. You should be able to provide values
for just 'n' and 'm' and get back the value for 'p' with an optimal 'k' selected
for you. Remember when converting from value_limit
to the 'm' value to plug
into the calculator that value_limit
is in bytes, and 'm' is often presented
in bits (1/8 of a byte).
Intended Usage
This transform is intended to be used as a protection mechanism to prevent
upstream mistakes. Such as a developer accidentally adding a request_id
tag. When this is happens, it is recommended to fix the upstream error as soon
as possible. This is because Vector's cardinality cache is held in memory and it
will be erased when Vector is restarted. This will cause new tag values to pass
through until the cardinality limit is reached again. For normal usage this
should not be a common problem since Vector processes are normally long-lived.
Restarts
This transform's cache is held in memory, and therefore, restarting Vector will reset the cache. This means that new values will be passed through until the cardinality limit is reached again. See intended usage for more info.