Building & Managing Complex Configs

Strategies for building and managing complex Vector configs
type: guidedomain: config

Writing large configuration files is not yet an official olympic event. However, it's still a good idea to get yourself ahead of the competition. In this guide we're going to cover some tips and tricks that will help you write clear, bug free Vector configs that are easy to maintain.

Generating Configs

In Vector each component of a pipeline specifies which components it consumes events from. This makes it very easy to build multiplexed topologies. However, writing a chain of transforms this way can sometimes be frustrating as the number of transforms increases.

Luckily, the Vector team are desperate for your approval and have worked hard to mitigate this with the generate subcommand, which can be used to generate the boilerplate for you. The command expects a list of components, where it then creates a config with all of those components connected in a linear chain.

For example, if we wished to create a chain of three transforms; json_parser, add_fields, and remove_fields, we can run:

vector generate /json_parser,add_fields,remove_fields > vector.toml
# Find out more with `vector generate --help`

And most of the boilerplate will be written for us, with each component printed with an inputs field that specifies the component before it:

vector.toml
[transforms.transform0]
inputs = [ "somewhere" ]
type = "json_parser"
# etc ...
[transforms.transform1]
inputs = [ "transform0" ]
type = "add_fields"
# etc ...
[transforms.transform2]
inputs = [ "transform1" ]
type = "remove_fields"
# etc ...

The names of the generated components are sequential (transform0, transform1, and so on). It's therefore worth doing a search and replace with your editor to give them better names, e.g. s/transform2/scrub_emails/g.

Testing Configs

Test driven Configurationn is a paradigm we just made up, so there's still time for you to adopt it before it's cool. Vector supports complementing your configs with unit tests, and as it turns out they're also pretty useful during the building stage.

Let's imagine we are in the process of building the config from the unit test guide, we might start off with our source and the grok parser:

vector.toml
[sources.over_tcp]
type = "socket"
mode = "tcp"
address = "0.0.0.0:9000"
[transforms.foo]
type = "grok_parser"
inputs = ["over_tcp"]
pattern = "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}"

A common way to test this transform might be to temporarily change the source into a stdin type, add a console sink pointed to our target transform, and run it with some sample data. However, this is awkward as it means distorting our config to run tests rather than focusing on features.

Instead, we can leave our source as a socket type and add a unit test to the end of our config:

vector.toml
[[tests]]
name = "check_simple_log"
[[tests.inputs]]
insert_at = "foo"
type = "raw"
value = "2019-11-28T12:00:00+00:00 info Sorry, I'm busy this week Cecil"
[[tests.outputs]]
extract_from = "foo"

When we add a unit test output without any conditions it will simply print the input and output events of a transform, allowing us to inspect its behavior:

$ vector test ./vector.toml
Running vector.toml tests
test vector.toml: check_simple_log ... passed
inspections:
--- vector.toml ---
test 'check_simple_log':
check transform 'foo' payloads (events encoded as JSON):
input: {"timestamp":"2020-02-11T15:04:02.361999Z", "message":"2019-11-28T12:00:00+00:00 info Sorry, I'm busy this week Cecil"}
output: {"level":"info","message":"Sorry, I'm busy this week Cecil", "timestamp":"2019-11-28T12:00:00+00:00"}

As we introduce new transforms to our config we can change the test output to check the latest transform. Or, occasionally, we can add conditions to an output in order to turn it into a regression test:

vector.toml
[[tests]]
name = "check_simple_log"
[[tests.inputs]]
insert_at = "foo"
type = "raw"
value = "2019-11-28T12:00:00+00:00 info Sorry, I'm busy this week Cecil"
# This is now a regression test
[[tests.outputs]]
extract_from = "foo"
[[tests.outputs.conditions]]
type = "check_fields"
"message.equals" = "Sorry, I'm busy this week Cecil"
# And we add a new output without conditions for inspecting
# a new transform
[[tests.outputs]]
extract_from = "bar"

How many tests you add is at your discretion, but you probably don't need to test every single transform. We recommend every four transforms, except during a full moon when you should test every two just to be sure.

Organizing Configs

Building configs is only the beginning. Once it's built you need to make sure pesky meddlers don't ruin it. The best way to keep on top of that is to break large configs down into smaller more manageable pieces.

With Vector you can split a config down into as many files as you like and run them all as a larger topology:

# These three examples run the same two configs together:
$ vector -c ./configs/foo.toml -c ./configs/bar.toml
$ vector -c ./configs/*.toml
$ vector -c ./configs/foo.toml ./configs/bar.toml

If you have a large chain of components it's a good idea to break them out into individual files, each with its own unit tests.

Updating Configs

Sometimes it's useful to update Vector configs on the fly. If you find yourself tinkering with a config that Vector is already running you can prompt it to reload the changes you've made by sending it a SIGHUP signal.

If you're running Vector in environments where it's not possible to issue SIGHUP signals you can instead run it with the --watch-config flag and it'll automatically gobble up changes whenever the file is written to.