Configuration

This section covers configuring Vector and creating pipelines like the example below. Vector's configuration supports TOML, YAML, and JSON to ensure Vector fits into your workflow. The configuration file(s) must be passed via the --config flag when starting Vector:

vector --config /etc/vector/vector.toml

Example

# Set global options
data_dir = "/var/lib/vector"
# Ingest data by tailing one or more files
[sources.apache_logs]
type = "file"
include = ["/var/log/apache2/*.log"] # supports globbing
ignore_older = 86400 # 1 day
# Structure and parse the data
[transforms.apache_parser]
inputs = ["apache_logs"]
type = "regex_parser" # fast/powerful regex
patterns = ['^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$']
# Sample the data to save on cost
[transforms.apache_sampler]
inputs = ["apache_parser"]
type = "sampler"
rate = 50 # only keep 50%
# Send structured data to a short-term storage
[sinks.es_cluster]
inputs = ["apache_sampler"] # only take sampled data
type = "elasticsearch"
host = "http://79.12.221.222:9200" # local or external host
index = "vector-%Y-%m-%d" # daily indices
# Send structured data to a cost-effective long-term storage
[sinks.s3_archives]
inputs = ["apache_parser"] # don't sample for S3
type = "aws_s3"
region = "us-east-1"
bucket = "my-log-archives"
key_prefix = "date=%Y-%m-%d" # daily partitions, hive friendly format
compression = "gzip" # compress final objects
encoding = "ndjson" # new line delimited JSON
batch.max_size = 10000000 # 10mb uncompressed

The key thing to note above is the use of the inputs option. This connects Vector's component to create a pipeline. For a simple introduction, please refer to the:

Getting Started Guide

Reference

Vector provides a full reference that you can use to build your configuration files.

Sources
Transforms
Sinks

And for more advanced techniques:

Env Vars
Global options
Template syntax
Tests

How It Works

Configuration File Location

The location of your Vector configuration file depends on your installation method. For most Linux based systems, the file can be found at /etc/vector/vector.toml.

Environment Variables

Vector will interpolate environment variables within your configuration file with the following syntax:

vector.toml
[transforms.add_host]
type = "add_fields"
[transforms.add_host.fields]
host = "${HOSTNAME}"
environment = "${ENV:-development}" # default value when not present

Default Values

Default values can be supplied via the :- syntax:

option = "${ENV_VAR:-default}"

Escaping

You can escape environment variable by preceding them with a $ character. For example $${HOSTNAME} will be treated literally in the above environment variable example.

Multiple Configuration Files

You can pass multiple configuration files when starting Vector:

vector --config vector1.toml --config vector2.toml

Or use a globbing syntax:

vector --config /etc/vector/*.toml

Multiple Formats

Vector supports TOML, YAML, and JSON to ensure Vector fits into your workflow. A side benefit of supporting JSON is the enablement of data templating languages like Jsonnet and [Cue][ursl.cue].

Template Syntax

Select configuration options support Vector's template syntax to produce dynamic values derived from the event's data. Two syntaxes are supported for fields that support field interpolation:

  1. Strptime specifiers. Ex: date=%Y/%m/%d
  2. Event fields. Ex: {{ field_name }}

For example:

vector.toml
[sinks.es_cluster]
type = "elasticsearch"
index = "user-{{ user_id }}-%Y-%m-%d"

The above index value will be calculated for each event. For example, given the following event:

{
"timestamp": "2019-05-02T00:23:22Z",
"message": "message",
"user_id": 2
}

The index value will result in:

index = "user-2-2019-05-02"

Learn more in the template reference.