Install Vector on Kubernetes

Kubernetes, also known as k8s, is an open-source container-orchestration system for automating application deployment, scaling, and management. This page will cover installing and managing Vector on the Kubernetes platform.

Install

Helm 3
Vector daemon deployment strategyVector daemon deployment strategy
1. Your service logs to STDOUT
STDOUT follows the 12 factor principles.
2. STDOUT is captured
STDOUT is captured by your platform.
3. Vector collects & fans-out data
Vector collects data from your platform.

The agent role is designed to collect all Kubernetes log data on each Node. Vector runs as a DaemonSet and tails logs for the entire Pod, automatically enriching them with Kubernetes metadata via the Kubernetes API. Collection is handled automatically, and it is intended for you to adjust your pipeline as necessary using Vector's sources, transforms, and sinks.


  1. Add the Vector repo

    helm repo add timberio https://packages.timber.io/helm/latest
  2. Check available Helm chart configuration options

    helm show values timberio/vector-agent
  3. Configure Vector

    cat <<-'VALUES' > values.yaml
    # The Vector Kubernetes integration automatically defines a
    # kubernetes_logs source that is made available to you.
    # You do not need to define a log source.
    sinks:
    # Adjust as necessary. By default we use the console sink
    # to print all data. This allows you to see Vector working.
    # https://vector.dev/docs/reference/sinks/
    stdout:
    type: console
    inputs: ["kubernetes_logs"]
    rawConfig: |
    target = "stdout"
    encoding = "json"
    VALUES
  4. Install Vector

    helm install --namespace vector --create-namespace vector timberio/vector-agent --values values.yaml
  5. Observe Vector

    kubectl logs --namespace vector daemonset/vector-agent
    explain this command

Deployment

Vector is an end-to-end observability data platform designed to deploy under various roles. You mix and match these roles to create topologies. The intent is to make Vector as flexible as possible, allowing you to fluidly integrate Vector into your infrastructure over time. The deployment section demonstrates common Vector pipelines:

Common Deployment Pipelines

Administration

Helm 3

Restart

kubectl rollout restart --namespace vector daemonset/vector-agent
explain this command

Observe

kubectl logs --namespace vector daemonset/vector-agent
explain this command

Upgrade

helm repo update && helm upgrade --namespace vector vector timberio/vector-agent --reuse-values
explain this command

Uninstall

Helm 3
helm uninstall --namespace vector vector
explain this command

How it works

Checkpointing

Vector checkpoints the current read position after each successful read. This ensures that Vector resumes where it left off if restarted, preventing data from being read twice. The checkpoint positions are stored in the data directory which is specified via the global data_dir option, but can be overridden via the data_dir option in the file source directly.

Container exclusion

The kubernetes_logs source can skip the logs from the individual containers of a particular Pod. Add an annotation vector.dev/exclude-containers to the Pod, and enumerate the names of all the containers to exclude in the value of the annotation like so:

vector.dev/exclude-containers: "container1,container2"

This annotation will make Vector skip logs originating from the container1 and container2 of the Pod marked with the annotation, while logs from other containers in the Pod will still be collected.

Context

By default, the kubernetes_logs source will augment events with helpful context keys as shown in the "Output" section.

Enrichment

Vector will enrich data with Kubernetes context. A comprehensive list of fields can be found in the kubernetes_logs source output docs.

Filtering

Vector provides rich filtering options for Kubernetes log collection:

  • Built-in Pod and container exclusion rules.
  • The exclude_paths_glob_patterns option allows you to exclude Kuberenetes log files by the file name and path.
  • The extra_field_selector option specifies the field selector to filter Pods with, to be used in addition to the built-in Node filter.
  • The extra_label_selector option specifies the label selector to filter Pods with, to be used in addition to the built-in vector.dev/exclude filter.

Kubernetes API access control

Vector requires access to the Kubernetes API. Specifically, the kubernetes_logs source uses the /api/v1/pods endpoint to "watch" the pods from all namespaces.

Modern Kubernetes clusters run with RBAC (role-based access control) scheme. RBAC-enabled clusters require some configuration to grant Vector the authorization to access the Kubernetes API endpoints. As RBAC is currently the standard way of controlling access to the Kubernetes API, we ship the necessary configuration out of the box: see ClusterRole, ClusterRoleBinding and a ServiceAccount in our kubectl YAML config, and the rbac configuration at the Helm chart.

If your cluster doesn't use any access control scheme and doesn't restrict access to the Kubernetes API, you don't need to do any extra configuration - Vector willjust work.

Clusters using legacy ABAC scheme are not officially supported (although Vector might work if you configure access properly) - we encourage switching to RBAC. If you use a custom access control scheme - make sure Vector Pod/ServiceAccount is granted access to the /api/v1/pods resource.

Kubernetes API communication

Vector communicates with the Kubernetes API to enrich the data it collects with Kubernetes context. Therefore, Vector must have access to communicate with the Kubernetes API server. If Vector is running in a Kubernetes cluster then Vector will connect to that cluster using the Kubernetes provided access information.

In addition to access, Vector implements proper desync handling to ensure communication is safe and reliable. This ensures that Vector will not overwhelm the Kubernetes API or compromise its stability.

Partial message merging

Vector, by default, will merge partial messages that are split due to the Docker size limit. For everything else, the kubernetes_logs source offers multiline options to configure custom merging to handle merging things like stacktraces.

Pod exclusion

By default, the kubernetes_logs source will skip logs from the Pods that have a vector.dev/exclude: "true" label. You can configure additional exclusion rules via label or field selectors, see the available options.

Pod removal

To ensure all data is collected, Vector will continue to collect logs from the Pod for some time after its removal. This ensures that Vector obtains some of the most important data, such as crash details.

Resource limits

Vector recommends the following resource limits.

Agent resource limits

If deploy Vector as an agent (collecting data for each of your Nodes), then we recommend the following limits:

resources:
requests:
memory: "64Mi"
cpu: "500m"
limits:
memory: "1024Mi"
cpu: "6000m"

As with all Kubernetes resource limit recommendations, use these as a reference point and adjust as ncessary. If your configured Vector pipeline is complex, you may need more resources. If you have a pipeline you may need less.

State management

Agent state management

For the agent role, Vector stores its state at the host-mapped dir with a static path, so if it's redeployed it'll continue from where it was interrupted.

Testing & reliability

Vector is tested extensively against Kubernetes. In addition to Kubernetes being Vector's most popular installation method, Vector implements a comprehensive end-to-end test suite for all minor Kubernetes versions starting with `1.14.