# Monitoring and observing Vector

Use logs and metrics generated by Vector itself in your Vector topology

Although Vector is primarily used to handle observability data from from a wide variety of sources, we also strive to
make Vector highly observable itself. To that end, Vector provides two sources, `internal_logs`

and
`internal_metrics`

, that you can use to handle logs and metrics produced by Vector just like you
would logs and metrics from any other source.

## Logs

Vector provide clear, informative, well-structured logs via the `internal_logs`

source. This section
shows you how to use them in your Vector topology.

*Which* logs Vector pipes through the `internal_logs`

source is determined by the log level, which defaults
to `info`

.

### Accessing logs

You can access Vector’s logs by adding an `internal_logs`

source to your topology. Here’s an example
configuration that takes Vector’s logs and pipes them to the console as plain text:

```
[sources.vector_logs]
type = "internal_logs"
[sinks.console]
type = "console"
inputs = ["vector_logs"]
```

### Using Vector logs

Once Vector logs enter your topology through the `internal_logs`

source, you can treat them like logs from any other
system, i.e. you can transform them and send them off to any number of sinks. The configuration below, for example,
transforms Vector’s logs using the `remap`

transform and Vector Remap Language and then stores those
logs in Clickhouse:

```
[sources.vector_logs]
type = "internal_logs"
[transforms.modify]
type = "remap"
inputs = ["vector_logs"]
# Reformat the timestamp to Unix time
source = '''
.timestamp = to_unix_timestamp!(to_timestamp!(.timestamp))
'''
[sinks.database]
type = "clickhouse"
inputs = ["modify"]
host = "http://localhost:8123"
table = "vector-log-data"
```

### Configuring logs

#### Levels

Vector logs at the `info`

level by default. You can set a different level when starting up your instance using either
command-line flags or the `LOG`

environment variable. The table below details these options:

Method | Description |
---|---|

`-v` flag | Drops the log level to `debug` |

`-vv` flag | Drops the log level to `trace` |

`-q` flag | Raises the log level to `warn` |

`-qq` flag | Raises the log level to `error` |

`-qqq` flag | Disables logging |

`LOG=<level>` environment variable | Set the log level. Must be one of `trace` , `debug` , `info` , `warn` , `error` , `off` . |

#### Stack traces

You can enable full error backtraces by setting the `RUST_BACKTRACE=full`

environment variable. More on this in the
Troubleshooting guide. You can

## Metrics

You can monitor metrics produced by Vector using the `internal_metrics`

source. As with Vector’s
internal logs, you can configure an `internal_metrics`

source and use the piped-in metrics
however you wish. Here’s an example configuration that

### Metrics catalogue

The table below provides a list of internal metrics provided by Vector. See the docs for the `internal_metrics`

source for more detailed information about the available metrics.

Name | Description | Data type |
---|---|---|

`adaptive_concurrency_averaged_rtt` | The average round-trip time (RTT) for the current window. | histogram |

`adaptive_concurrency_in_flight` | The number of outbound requests currently awaiting a response. | histogram |

`adaptive_concurrency_limit` | The concurrency limit that the adaptive concurrency feature has decided on for this current window. | histogram |

`adaptive_concurrency_observed_rtt` | The observed round-trip time (RTT) for requests. | histogram |

`aggregate_events_recorded_total` | The number of events recorded by the aggregate transform. | counter |

`aggregate_failed_updates` | The number of failed metric updates, `incremental` adds, encountered by the aggregate transform. | counter |

`aggregate_flushes_total` | The number of flushes done by the aggregate transform. | counter |

`api_started_total` | The number of times the Vector GraphQL API has been started. | counter |

`checkpoint_write_errors_total` | The total number of errors writing checkpoints. | counter |

`checkpoints_total` | The total number of files checkpointed. | counter |

`checksum_errors_total` | The total number of errors identifying files via checksum. | counter |

`collect_completed_total` | The total number of metrics collections completed for this component. | counter |

`collect_duration_seconds` | The duration spent collecting of metrics for this component. | histogram |

`command_executed_total` | The total number of times a command has been executed. | counter |

`command_execution_duration_seconds` | The command execution duration in seconds. | histogram |

`communication_errors_total` | The total number of errors stemming from communication with the Docker daemon. | counter |

`config_load_errors_total` | The total number of errors loading the Vector configuration. | counter |

`connection_errors_total` | The total number of connection errors for this Vector instance. | counter |

`connection_established_total` | The total number of times a connection has been established. | counter |

`connection_failed_total` | The total number of times a connection has failed. | counter |

`connection_read_errors_total` | The total number of errors reading datagram. | counter |

`connection_send_ack_errors_total` | The total number of protocol acknowledgement errors for this Vector instance for source protocols that support acknowledgements. | counter |

`connection_send_errors_total` | The total number of errors sending data via the connection. | counter |

`connection_shutdown_total` | The total number of times the connection has been shut down. | counter |

`consumer_offset_updates_failed_total` | The total number of failures to update a Kafka consumer offset. | counter |

`container_metadata_fetch_errors_total` | The total number of errors encountered when fetching container metadata. | counter |

`container_processed_events_total` | The total number of container events processed. | counter |

`containers_unwatched_total` | The total number of times Vector stopped watching for container logs. | counter |

`containers_watched_total` | The total number of times Vector started watching for container logs. | counter |

`decode_errors_total` | The total number of decode errors seen when decoding data in a source component. | counter |

`encode_errors_total` | The total number of errors encountered when encoding an event. | counter |

`events_discarded_total` | The total number of events discarded by this component. | counter |

`events_failed_total` | The total number of failures to read a Kafka message. | counter |

`events_in_total` | The number of events accepted by this component either from tagged origin like file and uri, or cumulatively from other origins. | counter |

`events_out_total` | The total number of events emitted by this component. | counter |

`file_delete_errors_total` | The total number of failures to delete a file. | counter |

`file_watch_errors_total` | The total number of errors encountered when watching files. | counter |

`files_added_total` | The total number of files Vector has found to watch. | counter |

`files_deleted_total` | The total number of files deleted. | counter |

`files_resumed_total` | The total number of times Vector has resumed watching a file. | counter |

`files_unwatched_total` | The total number of times Vector has stopped watching a file. | counter |

`fingerprint_read_errors_total` | The total number of times Vector failed to read a file for fingerprinting. | counter |

`glob_errors_total` | The total number of errors encountered when globbing paths. | counter |

`http_bad_requests_total` | The total number of HTTP `400 Bad Request` errors encountered. | counter |

`http_client_response_rtt_seconds` | The round-trip time (RTT) of HTTP requests, tagged with the response code. | histogram |

`http_client_responses_total` | The total number of HTTP requests, tagged with the response code. | counter |

`http_client_rtt_seconds` | The round-trip time (RTT) of HTTP requests. | histogram |

`http_error_response_total` | The total number of HTTP error responses for this component. | counter |

`http_request_errors_total` | The total number of HTTP request errors for this component. | counter |

`http_requests_total` | The total number of HTTP requests issued by this component. | counter |

`invalid_record_bytes_total` | The total number of bytes from invalid records that have been discarded. | counter |

`invalid_record_total` | The total number of invalid records that have been discarded. | counter |

`k8s_docker_format_parse_failures_total` | The total number of failures to parse a message as a JSON object. | counter |

`k8s_event_annotation_failures_total` | The total number of failures to annotate Vector events with Kubernetes Pod metadata. | counter |

`k8s_format_picker_edge_cases_total` | The total number of edge cases encountered while picking format of the Kubernetes log message. | counter |

`k8s_reflector_desyncs_total` | The total number of desyncs for the reflector. | counter |

`k8s_state_ops_total` | The total number of state operations. | counter |

`k8s_stream_chunks_processed_total` | The total number of chunks processed from the stream of Kubernetes resources. | counter |

`k8s_stream_processed_bytes_total` | The number of bytes processed from the stream of Kubernetes resources. | counter |

`k8s_watch_requests_failed_total` | The total number of watch requests failed. | counter |

`k8s_watch_requests_invoked_total` | The total number of watch requests invoked. | counter |

`k8s_watch_stream_failed_total` | The total number of watch streams failed. | counter |

`k8s_watch_stream_items_obtained_total` | The total number of items obtained from a watch stream. | counter |

`k8s_watcher_http_error_total` | The total number of HTTP error responses for the Kubernetes watcher. | counter |

`kafka_consumed_messages_bytes_total` | Total number of message bytes (including framing) received from Kafka brokers. | counter |

`kafka_consumed_messages_total` | Total number of messages consumed, not including ignored messages (due to offset, etc), from Kafka brokers. | counter |

`kafka_produced_messages_bytes_total` | Total number of message bytes (including framing, such as per-Message framing and MessageSet/batch framing) transmitted to Kafka brokers. | counter |

`kafka_produced_messages_total` | Total number of messages transmitted (produced) to Kafka brokers. | counter |

`kafka_queue_messages` | Current number of messages in producer queues. | gauge |

`kafka_queue_messages_bytes` | Current total size of messages in producer queues. | gauge |

`kafka_requests_bytes_total` | Total number of bytes transmitted to Kafka brokers. | counter |

`kafka_requests_total` | Total number of requests sent to Kafka brokers. | counter |

`kafka_responses_bytes_total` | Total number of bytes received from Kafka brokers. | counter |

`kafka_responses_total` | Total number of responses received from Kafka brokers. | counter |

`logging_driver_errors_total` | The total number of logging driver errors encountered caused by not using either
the `jsonfile` or `journald` driver. | counter |

`memory_used_bytes` | The total memory currently being used by Vector (in bytes). | gauge |

`metadata_refresh_failed_total` | The total number of failed efforts to refresh AWS EC2 metadata. | counter |

`metadata_refresh_successful_total` | The total number of AWS EC2 metadata refreshes. | counter |

`open_connections` | The number of current open connections to Vector. | gauge |

`parse_errors_total` | The total number of errors parsing metrics for this component. | counter |

`processed_bytes_total` | The number of bytes processed by the component. | counter |

`processed_events_total` | The total number of events processed by this component.
This metric is deprecated in place of using
`events_in_total` and
`events_out_total` metrics. | counter |

`processing_errors_total` | The total number of processing errors encountered by this component. | counter |

`protobuf_decode_errors_total` | The total number of Protocol Buffers errors thrown during communication between Vector instances. | counter |

`quit_total` | The total number of times the Vector instance has quit. | counter |

`recover_errors_total` | The total number of errors caused by Vector failing to recover from a failed reload. | counter |

`reload_errors_total` | The total number of errors encountered when reloading Vector. | counter |

`reloaded_total` | The total number of times the Vector instance has been reloaded. | counter |

`request_automatic_decode_errors_total` | The total number of request errors for this component when it attempted to automatically discover and handle the content-encoding of incoming request data. | counter |

`request_duration_seconds` | The total request duration in seconds. | histogram |

`request_errors_total` | The total number of requests errors for this component. | counter |

`request_read_errors_total` | The total number of request read errors for this component. | counter |

`requests_completed_total` | The total number of requests completed by this component. | counter |

`requests_received_total` | The total number of requests received by this component. | counter |

`send_errors_total` | The total number of errors sending messages. | counter |

`sqs_message_delete_failed_total` | The total number of failures to delete SQS messages. | counter |

`sqs_message_delete_succeeded_total` | The total number of successful deletions of SQS messages. | counter |

`sqs_message_processing_failed_total` | The total number of failures to process SQS messages. | counter |

`sqs_message_processing_succeeded_total` | The total number of SQS messages successfully processed. | counter |

`sqs_message_receive_failed_total` | The total number of failures to receive SQS messages. | counter |

`sqs_message_receive_succeeded_total` | The total number of times successfully receiving SQS messages. | counter |

`sqs_message_received_messages_total` | The total number of received SQS messages. | counter |

`sqs_s3_event_record_ignored_total` | The total number of times an S3 record in an SQS message was ignored (for an event that was not `ObjectCreated` ). | counter |

`stale_events_flushed_total` | The number of stale events that Vector has flushed. | counter |

`started_total` | The total number of times the Vector instance has been started. | counter |

`stdin_reads_failed_total` | The total number of errors reading from stdin. | counter |

`stopped_total` | The total number of times the Vector instance has been stopped. | counter |

`tag_value_limit_exceeded_total` | The total number of events discarded because the tag has been rejected after
hitting the configured `value_limit` . | counter |

`timestamp_parse_errors_total` | The total number of errors encountered parsing RFC 3339 timestamps. | counter |

`uptime_seconds` | The total number of seconds the Vector instance has been up. | gauge |

`utf8_convert_errors_total` | The total number of errors converting bytes to a UTF-8 string in UDP mode. | counter |

`value_limit_reached_total` | The total number of times new values for a key have been rejected because the value limit has been reached. | counter |

`windows_service_does_not_exist_total` | The total number of errors raised due to the Windows service not existing. | counter |

`windows_service_install_total` | The total number of times the Windows service has been installed. | counter |

`windows_service_restart_total` | The total number of times the Windows service has been restarted. | counter |

`windows_service_start_total` | The total number of times the Windows service has been started. | counter |

`windows_service_stop_total` | The total number of times the Windows service has been stopped. | counter |

`windows_service_uninstall_total` | The total number of times the Windows service has been uninstalled. | counter |

## Troubleshooting

More information in our troubleshooting guide:

## How it works

### Event-driven observability

Vector employs an event-driven observability strategy that ensures consistent and correlated telemetry data. You can read more about our approach in RFC 2064.

### Log rate limiting

Vector rate limits log events in the hot path. This enables you to get granular insight without the risk of saturating IO and disrupting the service. The trade-off is that repetitive logs aren’t logged.