Tag Cardinality Limit Transform

The Vector tag_cardinality_limit transform accepts and outputs metric events, allowing you to limit the cardinality of metric tags to prevent downstream disruption of metrics services.

Configuration

vector.toml
[transforms.my_transform_id]
type = "tag_cardinality_limit" # required
inputs = ["my-source-or-transform-id"] # required
limit_exceeded_action = "drop_tag" # optional, default
mode = "exact" # required
value_limit = 500 # optional, default
  • uint (bytes)optional

    cache_size_per_tag

    The size of the cache in bytes to use to detect duplicate tags. The bigger the cache the less likely it is to have a 'false positive' or a case where we allow a new value for tag even after we have reached the configured limits. See Memory Utilization for more info.

    • Only relevant when: mode = "probabilistic"
    • Default: 5120000 (bytes)
  • stringenumcommonoptional

    limit_exceeded_action

    Controls what should happen when a metric comes in with a tag that would exceed the configured limit on cardinality.

    • Default: "drop_tag"
    • Enum, must be one of: "drop_tag" "drop_event"
    • View examples
  • stringenumcommonrequired

    mode

    Controls what approach is used internally to keep track of previously seen tags and deterime when a tag on an incoming metric exceeds the limit.

    • No default
    • Enum, must be one of: "exact" "probabilistic"
    • View examples
  • uintcommonoptional

    value_limit

    How many distinct values to accept for any given key. See Memory Utilization for more info.

    • Default: 500

Examples

This example will demonstrate how metric tags are dropped if they exceed the configured value_limit.

For example, given this configuration:

vector.toml
[transforms.cardinality_protection]
type = "tag_cardinality_limit"
inputs = [...]
fields.value_limit = 1
fields.limit_exceeded_action = "drop_tag"

Vector will drop metric tags that exceed a cardinality of 1. Note, this limit is unrealistically low and used only for demontration purposes.

If we were to receive the first metric event:

{
"name": "login.count",
"timestamp": "2019-11-01T21:15:47+00:00",
"kind": "absolute",
"tags": {
"host": "my.host.com",
"request_id": "f9ed4675f1c53513c61a3b3b4e25b4c0"
},
"counter": {
"value": 10
}
}

It would pass through like normal; no tags would be removed. But if we were to recieve a second metric event:

{
"name": "login.count",
"timestamp": "2019-11-01T21:15:48+00:00",
"kind": "absolute",
"tags": {
"host": "my.host.com",
"request_id": "30f14c6c1fc85cba12bfd093aa8f90e3"
},
"counter": {
"value": 4
}
}

The request_id tag would be removed:

{
"name": "login.count",
"timestamp": "2019-11-01T21:15:48+00:00",
"kind": "absolute",
"tags": {
"host": "my.host.com"
},
"counter": {
"value": 4
}
}

This is because the cardinality is 2 for this metric tag, exceeding the configured limit of 1. Note that the host tag was not removed, because it still has a cardinality of 1.

If you'd like to drop the entire event, just set limit_exceeded_action to "drop_event".

How It Works

Complex Processing

If you encounter limitations with the tag_cardinality_limit transform then we recommend using a runtime transform. These transforms are designed for complex processing and give you the power of full programming runtime.

Environment Variables

Environment variables are supported through all of Vector's configuration. Simply add ${MY_ENV_VAR} in your Vector configuration file and the variable will be replaced before being evaluated.

You can learn more in the Environment Variables section.

Intended Usage

This transform is intended to be used as a protection mechanism to prevent upstream mistakes. Such as a developer accidentally adding a request_id tag. When this is happens, it is recommended to fix the upstream error as soon as possible. This is because Vector's cardinality cache is held in memory and it will be erased when Vector is restarted. This will cause new tag values to pass through until the cardinality limit is reached again. For normal usage this should not be a common problem since Vector processes are normally long-lived.

Memory Utilization

This transform stores in memory a copy of the key for every tag on every metric event seen by this transform. In mode exact, a copy of every distinct value for each key is also kept in memory, until value_limit distinct values have been seen for a given key, at which point new values for that key will be rejected. So to estimate the memory usage of this transform in mode exact you can use the following formula:

(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
your metrics * [`value_limit`](#value_limit) * average length of the values of tags for your
metrics)

In mode probabilistic, rather than storing all values seen for each key, each distinct key has a bloom filter which can probabilistically determine whether a given value has been seen for that key. The formula for estimating memory usage in mode probabilistic is:

(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
-your metrics * [`cache_size_per_tag`](#cache_size_per_tag))

The cache_size_per_tag option controls the size of the bloom filter used for storing the set of acceptable values for any single key. The larger the bloom filter the lower the false positive rate, which in our case means the less likely we are to allow a new tag value that would otherwise violate a configured limit. If you want to know the exact false positive rate for a given cache_size_per_tag and value_limit, there are many free online bloom filter calculators that can answer this. The formula is generally presented in terms of 'n', 'p', 'k', and 'm' where 'n' is the number of items in the filter (value_limit in our case), 'p' is the probability of false positives (what we want to solve for), 'k' is the number of hash functions used internally, and 'm' is the number of bits in the bloom filter. You should be able to provide values for just 'n' and 'm' and get back the value for 'p' with an optimal 'k' selected for you. Remember when converting from value_limit to the 'm' value to plug into the calculator that value_limit is in bytes, and 'm' is often presented in bits (1/8 of a byte).

Restarts

This transform's cache is held in memory, and therefore, restarting Vector will reset the cache. This means that new values will be passed through until the cardinality limit is reached again. See intended usage for more info.