Structured logs are like cocktails; they're cool because they're complicated. In this guide we'll build a pipeline using transformations that allows us to send unstructured events through it that look like this:
172.128.80.109 - Bins5273 656 [2019-05-03T13:11:48-04:00] \"PUT /mesh\" 406 10272
And have them coming out the other end in a structured format like this:
{"bytes_in":"656","timestamp":"2019-05-03T13:11:48-04:00","method":"PUT","bytes_out":"10272","host":"172.128.80.109","status":"406","user":"Bins5273","path":"/mesh"}
Tutorial
Setup a basic pipeline
In the last guide we simply piped stdin to stdout, I'm not trying to diminish your sense of achievement but that was pretty basic.
This time we're going to build a config we might use in the real world. It's going to consume logs over TCP with a
socket
source and write them to anelasticsearch
sink.The basic source to sink version of our pipeline looks like this:
vector.toml[sources.foo]type = "socket"address = "0.0.0.0:9000"mode = "tcp"[sinks.bar]inputs = ["foo"]type = "elasticsearch"index = "example-index"host = "http://10.24.32.122:9000"If we were to run it then the raw data we consume over TCP would be captured in the field
message
, and the object we'd publish to Elasticsearch would look like this:log event{"message":"172.128.80.109 - Bins5273 656 [2019-05-03T13:11:48-04:00] \"PUT /mesh\" 406 10272","host":"foo","timestamp":"2019-05-03T13:11:48-04:00"}That's hardly structured at all! Let's remedy that by adding our first transform.
Add a parsing transform
Nothing in this world is ever good enough for you, why should events be any different?
Vector makes it easy to mutate events into a more (or less) structured format with transforms. Let's parse our logs into a structured format by capturing named regular expression groups with a
regex_parser
transform.A config can have any number of transforms and it's entirely up to you how they are chained together. Similar to sinks, a transform requires you to specify where its data comes from. When a sink is configured to accept data from a transform the pipeline is complete.
Let's place our new transform in between our existing source and sink:
- Diff
- Full Config
vector.toml[sources.foo]type = "socket"address = "0.0.0.0:9000"mode = "tcp"+[transforms.apache_parser]+ inputs = ["foo"]+ type = "regex_parser"+ field = "message"+ regex = '^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<mathod>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'+[sinks.bar]- inputs = ["foo"]+ inputs = ["apache_parser"]type = "elasticsearch"index = "example-index"host = "http://10.24.32.122:9000"This regular expression looks great and it probably works, but it's best to be sure, right? Which leads us onto the next step.
Test it
No one is saying that unplanned explosions aren't cool, but you should be doing that in your own time. In order to test our transform we could set up a local Elasticsearch instance and run the whole pipeline, but that's an awful bother and Vector has a much better way.
Instead, we can write unit tests as part of our config just like you would for regular code:
- Diff
- Full Config
vector.toml# Write the data[sinks.bar]inputs = ["apache_parser"]type = "elasticsearch"index = "example-index"host = "http://10.24.32.122:9000"++[[tests]]+ name = "test apache regex"++ [[tests.inputs]]+ insert_at = "apache_parser"+ type = "raw"+ value = "172.128.80.109 - Bins5273 656 [2019-05-03T13:11:48-04:00] \"PUT /mesh\" 406 10272"++ [[tests.outputs]]+ extract_from = "apache_parser"+ [[tests.outputs.conditions]]+ type = "check_fields"+ "method.eq" = "PUT"+ "host.eq" = "172.128.80.109"+ "timestamp.eq" = "2019-05-03T13:11:48-04:00"+ "path.eq" = "/mesh"+ "status.eq" = "406"This unit test spec has a name, defines an input event to feed into our pipeline at a specific transform (in this case our only transform), and defines where we'd like to capture resulting events coming out along with a condition to check the events against.
When we run:
vector test ./vector.tomlIt will parse and execute our test:
Running vector.toml teststest vector.toml: test apache regex ... failedfailures:--- vector.toml ---test 'test apache regex':check transform 'apache_parser' failed conditions:condition[0]: predicates failed: [ method.eq: "PUT" ]payloads (events encoded as JSON):input: {"timestamp":"2020-02-20T10:19:27.283745Z","message":"172.128.80.109 - Bins5273 656 [2019-05-03T13:11:48-04:00] \"PUT /mesh\" 406 10272"}output: {"bytes_in":"656","timestamp":"2019-05-03T13:11:48-04:00","mathod":"PUT","bytes_out":"10272","host":"172.128.80.109","status":"406","user":"Bins5273","path":"/mesh"}By Jove! There was a problem with our regular expression! Our test has pointed out that the predicate
method.eq
failed, and has helpfully printed our input and resulting events in JSON format.This allows us to inspect exactly what our transform is doing, and it turns out that the method from our Apache log is actually being captured in a field
mathod
.See if you can spot the typo, once it's fixed we can run
vector test ./vector.toml
again and we should get this:Running vector.toml teststest vector.toml: test apache regex ... passedSuccess! Next, try experimenting by adding more transforms to your pipeline before moving onto the next guide.
Next Steps
Now that you're a Vector pro you'll have endless ragtag groups of misfits trying to recruit you as their hacker, but it won't mean much if you can't deploy Vector. Onto the next guide!