Configuration

This section covers configuring Vector and creating pipelines like the example below. Vector's configuration uses the TOML syntax, and the configuration file must be passed via the --config flag when starting vector:

vector --config /etc/vector/vector.toml

Example

vector.toml
# Set global options
data_dir = "/var/lib/vector"
# Ingest data by tailing one or more files
[sources.apache_logs]
type = "file"
include = ["/var/log/apache2/*.log"] # supports globbing
ignore_older = 86400 # 1 day
# Structure and parse the data
[transforms.apache_parser]
inputs = ["apache_logs"]
type = "regex_parser" # fast/powerful regex
regex = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'
# Sample the data to save on cost
[transforms.apache_sampler]
inputs = ["apache_parser"]
type = "sampler"
hash_field = "request_id" # sample _entire_ requests
rate = 50 # only keep 50%
# Send structured data to a short-term storage
[sinks.es_cluster]
inputs = ["apache_sampler"] # only take sampled data
type = "elasticsearch"
host = "http://79.12.221.222:9200" # local or external host
index = "vector-%Y-%m-%d" # daily indices
# Send structured data to a cost-effective long-term storage
[sinks.s3_archives]
inputs = ["apache_parser"] # don't sample for S3
type = "aws_s3"
region = "us-east-1"
bucket = "my-log-archives"
key_prefix = "date=%Y-%m-%d" # daily partitions, hive friendly format
batch_size = 10000000 # 10mb uncompressed
gzip = true # compress final objects
encoding = "ndjson" # new line delimited JSON

Quick Start

At the very minimum, a Vector configuration file must be composed of a source and a sink, transforms are optional. To get started:

  1. Choose a source

    To begin, you'll need to ingest data into Vector. This happens through one or more sources. For example:

    vector.toml
    [sources.nginx_logs]
    type = "file"
    include = "/var/log/nginx*.log"
  2. Optionally choose a transform

    Next, you'll want to choose a transform. Transforms are optional, but most configuration include at least one since they help to improve your data through parsing, structuring, and enriching. For example, let's use the regex_parser transform to parse and structure our data:

    vector.toml
    [sources.nginx_logs]
    type = "file"
    include = "/var/log/nginx*.log"
    [transforms.nginx_parser]
    inputs = ["nginx_logs"] # <--- connect the transform to our source
    type = "regex_parser"
    include = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'

    Notice how we connected the new transform to our source via the inputs option.

  3. Choose a sink

    Finally, you'll want to choose a sink. Sinks are responsible for emitting data out of Vector. For this example, we'll use the console sink, which is simply writes the data to STDOUT:

    vector.toml
    [sources.nginx_logs]
    type = "file"
    include = "/var/log/nginx*.log"
    [transforms.nginx_parser]
    inputs = ["nginx_logs"]
    type = "regex_parser"
    include = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'
    [sinks.print]
    inputs = ["nginx_parser"] # <--- connect the sink to our transform
    type = "console"

    Again, notice how we connect the new sink via the inputs option.

  4. Next steps

    This serves as a basic example of how to build a minimal Vector configuration file. It's likely you'll want to build more advanced pipelines which are covered in the guides section.

How It Works

Config File Location

The location of your Vector configuration file depends on your installation method. For most Linux based systems the file can be found at /etc/vector/vector.toml.

Environment Variables

Vector will interpolate environment variables within your configuration file with the following syntax:

vector.toml
[transforms.add_host]
type = "add_fields"
[transforms.add_host.fields]
host = "${HOSTNAME}"
environment = "${ENV:-development}" # default value when not present

Environment Variable Escaping

You can escape environment variable by preceding them with a $ character. For example $${HOSTNAME} will be treated literally in the above environment variable example.

Field Interpolation

Select configuration options support Vector's field interpolation syntax to produce dynamic values derived from the event's data. Two syntaxes are supported for fields that support field interpolation:

  1. Strptime specifiers. Ex: date=%Y/%m/%d
  2. Event fields. Ex: {{ field_name }}

For example:

vector.toml
[sinks.es_cluster]
type = "elasticsearch"
index = "user-{{ user_id }}-%Y-%m-%d"

The above index value will be calculated for each event. For example, given the following event:

{
"timestamp": "2019-05-02T00:23:22Z",
"message": "message",
"user_id": 2
}

The index value will result in:

index = "user-2-2019-05-02"

Syntax

The Vector configuration file follows the TOML syntax for it's simplicity, explicitness, and relaxed white-space parsing. For more information, please refer to the TOML documentation.

Types

All TOML values types are supported. For convenience this includes: