File

Output observability events into files

status: beta delivery: at-least-once acknowledgements: yes egress: stream state: stateless

Configuration

Example configurations

{
  "sinks": {
    "my_sink_id": {
      "type": "file",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "acknowledgements": null,
      "compression": "none",
      "encoding": {
        "codec": "json"
      },
      "healthcheck": null,
      "path": "/tmp/vector-%Y-%m-%d.log"
    }
  }
}
[sinks.my_sink_id]
type = "file"
inputs = [ "my-source-or-transform-id" ]
compression = "none"
path = "/tmp/vector-%Y-%m-%d.log"

  [sinks.my_sink_id.encoding]
  codec = "json"
---
sinks:
  my_sink_id:
    type: file
    inputs:
      - my-source-or-transform-id
    acknowledgements: null
    compression: none
    encoding:
      codec: json
    healthcheck: null
    path: /tmp/vector-%Y-%m-%d.log
{
  "sinks": {
    "my_sink_id": {
      "type": "file",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "idle_timeout_secs": 30,
      "buffer": null,
      "acknowledgements": null,
      "compression": "none",
      "framing": null,
      "encoding": {
        "codec": "json"
      },
      "healthcheck": null,
      "path": "/tmp/vector-%Y-%m-%d.log"
    }
  }
}
[sinks.my_sink_id]
type = "file"
inputs = [ "my-source-or-transform-id" ]
idle_timeout_secs = 30
compression = "none"
path = "/tmp/vector-%Y-%m-%d.log"

  [sinks.my_sink_id.encoding]
  codec = "json"
---
sinks:
  my_sink_id:
    type: file
    inputs:
      - my-source-or-transform-id
    idle_timeout_secs: 30
    buffer: null
    acknowledgements: null
    compression: none
    framing: null
    encoding:
      codec: json
    healthcheck: null
    path: /tmp/vector-%Y-%m-%d.log

acknowledgements

common optional object
Controls how acknowledgements are handled by this sink. When enabled, all connected sources that support end-to-end acknowledgements will wait for the destination of this sink to acknowledge receipt of events before providing acknowledgement to the sending source. These settings override the global acknowledgement settings.

acknowledgements.enabled

common optional bool
Controls if all connected sources will wait for this sink to deliver the events before acknowledging receipt.
default: false

buffer

optional object

Configures the sink specific buffer behavior.

More information about the individual buffer types, and buffer behavior, can be found in the Buffering Model section.

buffer.max_events

common optional uint
The maximum number of events allowed in the buffer.
Relevant when: type = "memory"
default: 500 (events)

buffer.max_size

required uint
The maximum size of the buffer on the disk. Must be at least ~256 megabytes (268435488 bytes).
Relevant when: type = "disk"
Examples
268435488

buffer.type

common optional string literal enum
The type of buffer to use.
Enum options
OptionDescription
disk

Events are buffered on disk.

This is less performant, but more durable. Data that has been synchronized to disk will not be lost if Vector is restarted forcefully or crashes.

Data is synchronized to disk every 500ms.

memory

Events are buffered in memory.

This is more performant, but less durable. Data will be lost if Vector is restarted forcefully or crashes.

default: memory

buffer.when_full

optional string literal enum
The behavior when the buffer becomes full.
Enum options
OptionDescription
block

Waits for capacity in the buffer.

This will cause backpressure to propagate to upstream components, which can cause data to pile up on the edge.

drop_newest

Drops the event without waiting for capacity in the buffer.

The data is lost. This should only be used when performance is the highest priority.

default: block

compression

common optional string literal enum

The compression strategy used to compress the encoded event data before transmission.

The default compression level of the chosen algorithm is used. Some cloud storage API clients and browsers will handle decompression transparently, so files may not always appear to be compressed depending how they are accessed.

Enum options string literal
OptionDescription
gzipGzip standard DEFLATE compression. Compression level is 6 unless otherwise specified.
noneNo compression.
zstdzstd compression. Compression level is 3 unless otherwise specified. Dictionaries are not supported.
default: none

encoding

required object
Configures the encoding specific sink behavior.

encoding.codec

required string literal enum
The encoding codec used to serialize the events before outputting.
Enum options
OptionDescription
jsonJSON encoded event.
textThe message field from the event.
Examples
"json"
"text"

encoding.except_fields

optional [string]
Prevent the sink from encoding the specified fields.

encoding.only_fields

optional [string]
Makes the sink encode only the specified fields.

encoding.timestamp_format

optional string literal enum
How to format event timestamps.
Enum options
OptionDescription
rfc3339Formats as a RFC3339 string
unixFormats as a unix timestamp
default: rfc3339

framing

optional object
Configures in which way events encoded as byte frames should be separated in a payload.
Options for character_delimited framing.
Relevant when: method = `character_delimited`
The character used to separate frames.
Examples
"\n"
"\t"

framing.method

common optional string literal enum
The framing method.
Enum options
OptionDescription
bytesByte frames are concatenated.
character_delimitedByte frames are delimited by a chosen character.
length_delimitedByte frames are prefixed by an unsigned big-endian 32-bit integer indicating the length.
newline_delimitedByte frames are delimited by a newline character.
default: A suitable default is chosen depending on the sink type and the selected codec.

healthcheck

common optional object
Health check options for the sink.

healthcheck.enabled

common optional bool
Enables/disables the healthcheck upon Vector boot.
default: true

idle_timeout_secs

optional uint
The amount of time a file can be idle and stay open. After not receiving any events for this timeout, the file will be flushed and closed.
default: 30

inputs

required [string]

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

Array string literal
Examples
[
  "my-source-or-transform-id",
  "prefix-*"
]

path

required string template
File name to write events to. Compression format extension must be explicit.
Note: This parameter supports Vector's template syntax, which enables you to use dynamic per-event values.
Examples
"/tmp/vector-%Y-%m-%d.log"
"/tmp/application-{{ application_id }}-%Y-%m-%d.log"

Telemetry

Metrics

link

buffer_byte_size

gauge
The number of bytes current in the buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

buffer_discarded_events_total

counter
The number of events dropped by this non-blocking buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

buffer_events

gauge
The number of events currently in the buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

buffer_received_event_bytes_total

counter
The number of bytes received by this buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

buffer_received_events_total

counter
The number of events received by this buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

buffer_sent_event_bytes_total

counter
The number of bytes sent by this buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

buffer_sent_events_total

counter
The number of events sent by this buffer.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

component_received_event_bytes_total

counter
The number of event bytes accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host optional
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid optional
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

component_received_events_count

histogram

A histogram of the number of events passed in each internal batch in Vector’s internal topology.

Note that this is separate than sink-level batching. It is mostly useful for low level debugging performance issues in Vector due to small internal batches.

component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host optional
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid optional
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

component_received_events_total

counter
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host optional
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid optional
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

component_sent_bytes_total

counter
The number of raw bytes sent by this component to destination sinks.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
endpoint optional
The endpoint to which the bytes were sent. For HTTP, this will be the host and path only, excluding the query string.
file optional
The absolute path of the destination file.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.
protocol
The protocol used to send the bytes.
region optional
The AWS region name to which the bytes were sent. In some configurations, this may be a literal hostname.

component_sent_event_bytes_total

counter
The total number of event bytes emitted by this component.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
output optional
The specific output of the component.
pid optional
The process ID of the Vector instance.

component_sent_events_total

counter
The total number of events emitted by this component.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
output optional
The specific output of the component.
pid optional
The process ID of the Vector instance.

events_discarded_total

counter
The total number of events discarded by this component.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.
reason
The type of the error

events_in_total

counter
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins. This metric is deprecated and will be removed in a future version. Use component_received_events_total instead.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host optional
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid optional
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

processing_errors_total

counter
The total number of processing errors encountered by this component. This metric is deprecated in favor of component_errors_total.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
error_type
The type of the error
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

utilization

gauge
A ratio from 0 to 1 of the load on a component. A value of 0 would indicate a completely idle component that is simply waiting for input. A value of 1 would indicate a that is never idle. This value is updated every 5 seconds.
component_id
The Vector component ID.
component_kind
The Vector component kind.
component_name
Deprecated, use component_id instead. The value is the same as component_id.
component_type
The Vector component type.
host optional
The hostname of the system Vector is running on.
pid optional
The process ID of the Vector instance.

How it works

File & Directory Creation

Vector will attempt to create the entire directory structure and the file when emitting events to the file sink. This requires that the Vector agent have the correct permissions to create and write to files in the specified directories.

Durability of Created Files

Vector makes no attempt to ensure the files output by this sink are durably written to disk by using any of the “sync” write modes. As such, this sink only ensures that the operating system does not generate an error, it does not wait until the data is written to disk before acknowledging the events.

Health checks

Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.

Require health checks

If you’d like to exit immediately upon a health check failure, you can pass the --require-healthy flag:

vector --config /etc/vector/vector.toml --require-healthy

Disable health checks

If you’d like to disable health checks for this sink you can set the healthcheck option to false.

State

This component is stateless, meaning its behavior is consistent across each input.