Kubernetes logs
Collect logs from Kubernetes Nodes
Requirements
>= 1.19
is required.Configuration
Example configurations
{
"sources": {
"my_source_id": {
"type": "kubernetes_logs",
"ignore_older_secs": 600,
"read_from": "beginning"
}
}
}
[sources.my_source_id]
type = "kubernetes_logs"
ignore_older_secs = 600
read_from = "beginning"
---
sources:
my_source_id:
type: kubernetes_logs
ignore_older_secs: 600
read_from: beginning
{
"sources": {
"my_source_id": {
"type": "kubernetes_logs",
"pod_annotation_fields": null,
"namespace_annotation_fields": null,
"node_annotation_fields": null,
"auto_partial_merge": true,
"ingestion_timestamp_field": null,
"kube_config_file": null,
"ignore_older_secs": 600,
"read_from": "beginning",
"self_node_name": "${VECTOR_SELF_NODE_NAME}",
"exclude_paths_glob_patterns": [
"**/exclude/**"
],
"extra_field_selector": "metadata.name!=pod-name-to-exclude",
"extra_label_selector": "my_custom_label!=my_value",
"extra_namespace_label_selector": "my_custom_label!=my_value",
"max_read_bytes": 2048,
"max_line_bytes": 32768,
"fingerprint_lines": 1,
"glob_minimum_cooldown_ms": 60000,
"delay_deletion_ms": 60000,
"data_dir": "/var/lib/vector",
"timezone": "local"
}
}
}
[sources.my_source_id]
type = "kubernetes_logs"
auto_partial_merge = true
ignore_older_secs = 600
read_from = "beginning"
self_node_name = "${VECTOR_SELF_NODE_NAME}"
exclude_paths_glob_patterns = [ "**/exclude/**" ]
extra_field_selector = "metadata.name!=pod-name-to-exclude"
extra_label_selector = "my_custom_label!=my_value"
extra_namespace_label_selector = "my_custom_label!=my_value"
max_read_bytes = 2_048
max_line_bytes = 32_768
fingerprint_lines = 1
glob_minimum_cooldown_ms = 60_000
delay_deletion_ms = 60_000
data_dir = "/var/lib/vector"
timezone = "local"
---
sources:
my_source_id:
type: kubernetes_logs
pod_annotation_fields: null
namespace_annotation_fields: null
node_annotation_fields: null
auto_partial_merge: true
ingestion_timestamp_field: null
kube_config_file: null
ignore_older_secs: 600
read_from: beginning
self_node_name: ${VECTOR_SELF_NODE_NAME}
exclude_paths_glob_patterns:
- "**/exclude/**"
extra_field_selector: metadata.name!=pod-name-to-exclude
extra_label_selector: my_custom_label!=my_value
extra_namespace_label_selector: my_custom_label!=my_value
max_read_bytes: 2048
max_line_bytes: 32768
fingerprint_lines: 1
glob_minimum_cooldown_ms: 60000
delay_deletion_ms: 60000
data_dir: /var/lib/vector
timezone: local
auto_partial_merge
optional booltrue
data_dir
optional string file_system_pathdata_dir
option is used. Please make sure the Vector project has write permissions to this dir.delay_deletion_ms
optional uintDELETE
event and removing any related metadata Vector has stored. This controls how quickly Vector will remove
metadata for resources that have been removed from Kubernetes, a longer delay will allow Vector to continue processing and enriching logs after the source Pod has been deleted.
If Vector tries to process logs from a Pod which has already had its metadata removed from the local cache, it will fail to enrich the event with metadata and log a warning.60000
(milliseconds)exclude_paths_glob_patterns
optional [string][**/*.gz **/*.tmp]
extra_field_selector
optional string literalPod
s with, to be used in addition to the built-in Node
filter.
The name of the Kubernetes Node
this Vector instance runs at. Configured to use an env var by default, to be evaluated to a value provided by Kubernetes at Pod deploy time.extra_label_selector
optional string literalPod
s with, to be used in
addition to the built-in vector.dev/exclude
filter.extra_namespace_label_selector
optional string literalNamespace
s with, to be used in
addition to the built-in vector.dev/exclude
filter.fingerprint_lines
optional uintlines
value is, the greater the chance of it not reading the last file/logs of
the container.1
(lines)glob_minimum_cooldown_ms
optional uint60000
(milliseconds)ignore_older_secs
common optional uintingestion_timestamp_field
optional string literalkube_config_file
optional string literalmax_line_bytes
optional uint32768
(bytes)max_read_bytes
optional uint2048
(bytes)namespace_annotation_fields
optional objectnamespace_annotation_fields.namespace_labels
optional string literalkubernetes.namespace_labels
node_annotation_fields
optional objectnode_annotation_fields.node_labels
optional string literalkubernetes.node_labels
pod_annotation_fields
optional objectpod_annotation_fields.container_id
optional string literalkubernetes.container_id
pod_annotation_fields.container_image
optional string literalkubernetes.container_image
pod_annotation_fields.container_name
optional string literalkubernetes.container_name
pod_annotation_fields.pod_annotations
optional string literalkubernetes.pod_annotations
pod_annotation_fields.pod_ip
optional string literalkubernetes.pod_ip
pod_annotation_fields.pod_ips
optional string literalkubernetes.pod_ips
pod_annotation_fields.pod_labels
optional string literalkubernetes.pod_labels
pod_annotation_fields.pod_name
optional string literalkubernetes.pod_name
pod_annotation_fields.pod_namespace
optional string literalkubernetes.pod_namespace
pod_annotation_fields.pod_node_name
optional string literalkubernetes.pod_node_name
pod_annotation_fields.pod_owner
optional string literalkubernetes.pod_owner
pod_annotation_fields.pod_uid
optional string literalkubernetes.pod_uid
read_from
common optional string literal enumOption | Description |
---|---|
beginning | Read from the beginning of the file. |
end | Start reading from the current end of the file. |
beginning
self_node_name
optional string literalNode
this Vector instance runs at. Configured to use an env var by default, to be evaluated to a value provided by Kubernetes at Pod deploy time.${VECTOR_SELF_NODE_NAME}
timezone
optional string literaltimezone
option.
The time zone name may be any name in the TZ database, or local
to
indicate system local time.local
Outputs
<component_id>
Output Data
Logs
Line
Pod
log file./var/log/pods/pod-namespace_pod-name_pod-uid/container/1.log
docker://f24c81dcd531c5d353751c77fe0556a4f602f7714c72b9a58f9b26c0628f1fa6
busybox:1.30
coredns
{
"mylabel": "myvalue"
}
{
"myannotation": "myvalue"
}
192.168.1.1
192.168.1.1
::1
{
"mylabel": "myvalue"
}
coredns-qwertyuiop-qwert
kube-system
minikube
ReplicaSet/coredns-565d847f94
ba46d8c9-9541-4f6b-bbf9-d23b36f2f136
53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] "GET /disintermediate HTTP/2.0" 401 20308
kubernetes_logs
stdout
stderr
2020-10-10T17:07:36.452332Z
Telemetry
Metrics
linkcomponent_discarded_events_total
countercomponent_id
instead. The value is the same as component_id
.component_errors_total
countercomponent_id
instead. The value is the same as component_id
.component_received_bytes_total
countercomponent_id
instead. The value is the same as component_id
.component_received_event_bytes_total
countercomponent_id
instead. The value is the same as component_id
.component_received_events_total
countercomponent_id
instead. The value is the same as component_id
.component_sent_event_bytes_total
countercomponent_id
instead. The value is the same as component_id
.component_sent_events_total
countercomponent_id
instead. The value is the same as component_id
.events_in_total
countercomponent_received_events_total
instead.component_id
instead. The value is the same as component_id
.events_out_total
countercomponent_sent_events_total
instead.component_id
instead. The value is the same as component_id
.k8s_docker_format_parse_failures_total
countercomponent_id
instead. The value is the same as component_id
.k8s_event_annotation_failures_total
countercomponent_id
instead. The value is the same as component_id
.k8s_format_picker_edge_cases_total
countercomponent_id
instead. The value is the same as component_id
.k8s_reflector_desyncs_total
countercomponent_id
instead. The value is the same as component_id
.k8s_state_ops_total
countercomponent_id
instead. The value is the same as component_id
.k8s_stream_chunks_processed_total
countercomponent_id
instead. The value is the same as component_id
.k8s_stream_processed_bytes_total
countercomponent_id
instead. The value is the same as component_id
.k8s_watch_requests_failed_total
countercomponent_id
instead. The value is the same as component_id
.k8s_watch_requests_invoked_total
countercomponent_id
instead. The value is the same as component_id
.k8s_watch_stream_failed_total
countercomponent_id
instead. The value is the same as component_id
.k8s_watch_stream_items_obtained_total
countercomponent_id
instead. The value is the same as component_id
.k8s_watcher_http_error_total
countercomponent_id
instead. The value is the same as component_id
.processed_bytes_total
countercomponent_id
instead. The value is the same as component_id
.processed_events_total
countercomponent_received_events_total
and
component_sent_events_total
metrics.component_id
instead. The value is the same as component_id
.source_lag_time_seconds
histogramcomponent_id
instead. The value is the same as component_id
.Examples
Sample Output
Given this event...F1015 11:01:46.499073 1 main.go:39] error getting server version: Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: connect: network is unreachable
[sources.my_source_id]
type = "kubernetes_logs"
---
sources:
my_source_id:
type: kubernetes_logs
{
"sources": {
"my_source_id": {
"type": "kubernetes_logs"
}
}
}
{
"file": "/var/log/pods/kube-system_storage-provisioner_93bde4d0-9731-4785-a80e-cd27ba8ad7c2/storage-provisioner/1.log",
"kubernetes.container_image": "gcr.io/k8s-minikube/storage-provisioner:v3",
"kubernetes.container_name": "storage-provisioner",
"kubernetes.namespace_labels": {
"kubernetes.io/metadata.name": "kube-system"
},
"kubernetes.pod_annotations": {
"prometheus.io/scrape": "false"
},
"kubernetes.pod_ip": "192.168.1.1",
"kubernetes.pod_ips": [
"192.168.1.1",
"::1"
],
"kubernetes.pod_labels": {
"addonmanager.kubernetes.io/mode": "Reconcile",
"gcp-auth-skip-secret": "true",
"integration-test": "storage-provisioner"
},
"kubernetes.pod_name": "storage-provisioner",
"kubernetes.pod_namespace": "kube-system",
"kubernetes.pod_node_name": "minikube",
"kubernetes.pod_uid": "93bde4d0-9731-4785-a80e-cd27ba8ad7c2",
"message": "F1015 11:01:46.499073 1 main.go:39] error getting server version: Get \"https://10.96.0.1:443/version?timeout=32s\": dial tcp 10.96.0.1:443: connect: network is unreachable",
"source_type": "kubernetes_logs",
"stream": "stderr",
"timestamp": "2020-10-15T11:01:46.499555308Z"
}
How it works
Checkpointing
data_dir
option, but can be overridden
via the data_dir
option in the file source directly.Container exclusion
The kubernetes_logs
source
can skip the logs from the individual container
s of a particular
Pod
. Add an annotation vector.dev/exclude-containers
to the
Pod
, and enumerate the name
s of all the container
s to exclude in
the value of the annotation like so:
vector.dev/exclude-containers: "container1,container2"
This annotation will make Vector skip logs originating from the
container1
and container2
of the Pod
marked with the annotation,
while logs from other container
s in the Pod
will still be
collected.
Enrichment
kubernetes_logs
source output docs.Filtering
Vector provides rich filtering options for Kubernetes log collection:
- Built-in
Pod
andcontainer
exclusion rules. - The
exclude_paths_glob_patterns
option allows you to exclude Kubernetes log files by the file name and path. - The
extra_field_selector
option specifies the field selector to filter Pods with, to be used in addition to the built-inNode
filter. - The
extra_label_selector
option specifies the label selector to filterPod
s with, to be used in addition to the built-invector.dev/exclude
filter.
Globbing
By default, the kubernetes_logs
source
ignores compressed and temporary files. This behavior can be configured with the
exclude_paths_glob_patterns
option.
Globbing is used to continually discover Pod
s log files
at a rate defined by the glob_minimum_cooldown
option. In environments when files are
rotated rapidly, we recommend lowering the glob_minimum_cooldown
to catch files
before they are compressed.
Kubernetes API access control
Vector requires access to the Kubernetes API.
Specifically, the kubernetes_logs
source
uses the /api/v1/pods
, /api/v1/namespaces
, and /api/v1/nodes
endpoints
to “list” and “watch” resources we use to enrich events with additional metadata.
Modern Kubernetes clusters run with RBAC (role-based access control)
scheme. RBAC-enabled clusters require some configuration to grant Vector
the authorization to access the Kubernetes API endpoints. As RBAC is
currently the standard way of controlling access to the Kubernetes API,
we ship the necessary configuration out of the box: see ClusterRole
,
ClusterRoleBinding
and a ServiceAccount
in our kubectl
YAML
config, and the rbac
configuration at the Helm chart.
If your cluster doesn’t use any access control scheme and doesn’t restrict access to the Kubernetes API, you don’t need to do any extra configuration - Vector will just work.
Clusters using legacy ABAC scheme are not officially supported
(although Vector might work if you configure access properly) -
we encourage switching to RBAC. If you use a custom access control
scheme - make sure Vector Pod
/ServiceAccount
is granted “list” and “watch” access
to the /api/v1/pods
, /api/v1/namespaces
, and /api/v1/nodes
resources.
Kubernetes API communication
Vector communicates with the Kubernetes API to enrich the data it collects with Kubernetes context. Therefore, Vector must have access to communicate with the Kubernetes API server. If Vector is running in a Kubernetes cluster then Vector will connect to that cluster using the Kubernetes provided access information.
In addition to access, Vector implements proper desync handling to ensure communication is safe and reliable. This ensures that Vector will not overwhelm the Kubernetes API or compromise its stability.
Partial message merging
reduce
transform which offers
the ability to handle custom merging of things like
stacktraces.Pod exclusion
kubernetes_logs
source
will skip logs from the Pod
s that have a vector.dev/exclude: "true"
label.
You can configure additional exclusion rules via label or field selectors,
see the available options.Pod removal
Pod
for some time after its removal. This ensures that Vector obtains some of
the most important data, such as crash details.Resource limits
Agent resource limits
If deploy Vector as an agent (collecting data for each of your Nodes), then we recommend the following limits:
resources:
requests:
memory: "64Mi"
cpu: "500m"
limits:
memory: "1024Mi"
cpu: "6000m"
As with all Kubernetes resource limit recommendations, use these as a reference point and adjust as necessary. If your configured Vector pipeline is complex, you may need more resources; if you have a more straightforward pipeline, you may need less.