Getting Started with Vector

Welcome to Vector!

Vector is a high-performance observability data pipeline that allows you to collect, transform, and route all your logs and metrics.

In this getting started guide, we’ll walk you through using Vector for the first time. We’ll install Vector and create our first observability data pipeline so you can begin to see what Vector can do.

Install Vector

Installing Vector is quick and easy. We can use this handy installation script:

curl --proto '=https' --tlsv1.2 -sSfL https://sh.vector.dev | bash

Or you can choose your preferred installation method.

Once we have Vector installed, let’s check to make sure that it’s working correctly:

vector --version

Configure Vector

Vector topologies are defined with a configuration file that tells it what components to run and how they should interact. Vector topologies are made up of three types of components:

Sources collect or receive data from observability data sources into Vector
Transforms can manipulate or change that observability data inside Vector
Sinks send data onwards from Vector to external services or destinations

Let’s create a configuration file:

[sources.in]
  type = "stdin"

[sinks.out]
  inputs = ["in"]
  type = "console"
  encoding.codec = "text"

Each component has a unique ID and is prefixed with the type of component, for example, sources for a source. Our first component, sources.in, uses the stdin source which tells Vector to receive data over stdin and has the ID in.

Our second component, sinks.out, uses the console source which tells Vector to simply print the data to stdout. The encoding.codec option tells Vector to print the data as plain text (unencoded).

The inputs option in our sinks.out component tells Vector where this sink’s events are coming from. In our case, events are received from our other component, the source with ID in.

Hello World!

That’s it for our first config. Now let’s pipe an event through it:

echo 'Hello World!' | vector --config ./vector.toml

The echo statement sends a single log to Vector via stdin. The vector... command starts Vector with our previously created config file.

The event we’ve just sent will be received by our sources.in component, then sent onto the sinks.out component, which will, in turn, echo it back to the console:

... some logs ...
Hello World!

If you want to see something cool, try setting encoding.codec = "json" in the sink config.

Hello World Mark II

The echoing of events isn’t very exciting. Let’s see what we can do with some real observability data. Let’s take a classic problem, collecting and processing Syslog events, and see how Vector handles it.

To do this, we’re going to add two new components to our configuration file. Let’s look at our updated configuration now:

[sources.generate_syslog]
  type = "demo_logs"
  format = "syslog"
  count = 100

[transforms.remap_syslog]
  inputs = [ "generate_syslog"]
  type = "remap"
  source = '''
    structured = parse_syslog!(.message)
    . = merge(., structured)
  '''

[sinks.emit_syslog]
  inputs = ["remap_syslog"]
  type = "console"
  encoding.codec = "json"

The first component uses the demo_logs source. The demo_logs source creates sample log data that can allow you to simulate different types of events in various formats.

But, but you said “real” observability data? We choose generated data because it’s hard for us to know what platform you’re trying Vector on. That means it’s also hard to document a single way for everyone to get data into Vector.

The second component is a transform called remap. The remap transform is at the heart of what makes Vector so powerful for processing observability data. The transform exposes a simple language called Vector Remap Language that allows you to parse, manipulate, and decorate your event data as it passes through Vector. Using remap, you can turn static events into informational data that can help you ask and answer questions about your environment’s state.

You can see we’ve added the sources.generated_syslog component. The format option tells the demo_logs source what type of logs to emit, here syslog, and the count option tells the demo_logs source how many lines to emit, here 100.

In our second component, transforms.remap_syslog, we’ve specified an inputs option of generate_syslog, which means it will receive events from our generate_syslog source. We’ve also specified the type of transform: remap.

Inside the source option of the remap_syslog component is where we start to see Vector’s power. The source contains the list of remapping transformations to apply to each event Vector receives. We’re only performing one operation: parse_syslog. We’re passing this function a single field called message, which contains the Syslog event we’re generating. This all-in-one function takes a Syslog-formatted message, parses its contents, and emits it as a structured event. Wait, I can hear you saying? What have you done with my many lines of Syslog parsing regular expressions? Remap removes the need for this and allows you to focus on the event’s value, not on how to extract that value.

We support parsing a variety of logging formats. Of course, if you have an event format that we don’t support, you can also specify your own custom regular expression using remap too! The ! after the parse_syslog function tells Vector to emit an error if the message fails to parse, meaning you’ll know if some non-standard Syslog is received, and you can adjust your remapping to accommodate it!

Lastly, we’ve changed the ID of our sink component to emit_syslog, updated the inputs option to process events generated by the remap_syslog transform, and specified that we want to emit events in JSON-format.

Let’s rerun Vector. This time we don’t need to echo any data to it; just run in on the command line. It’ll process 100 lines of generated Syslog data, emit the processed data as JSON, and exit:

vector --config ./vector.toml

Now you should have a series of JSON-formatted events, something like this:

{"appname":"benefritz","facility":"authpriv","hostname":"some.de","message":"We're gonna need a bigger boat","msgid":"ID191","procid":9473,"severity":"crit","timestamp":"2021-01-20T19:38:55.329Z"}
{"appname":"meln1ks","facility":"local1","hostname":"for.com","message":"Take a breath, let it go, walk away","msgid":"ID451","procid":484,"severity":"debug","timestamp":"2021-01-20T19:38:55.329Z"}
{"appname":"shaneIxD","facility":"uucp","hostname":"random.com","message":"A bug was encountered but not in Vector, which doesn't have bugs","msgid":"ID428","procid":3093,"severity":"alert","timestamp":"2021-01-20T19:38:55.329Z"}

We can see that Vector has parsed the Syslog message and created a structured event containing all of the Syslog fields. All with one line of Vector’s remap language. This example is just the beginning of Vector’s capabilities. You can receive logs and events from dozens of sources. You can use Vector and remap to change data, add fields to decorate data, convert logs into metrics, drop fields, and dozens of other tasks you use daily to process your observability data. You can then route and output your events to dozens of destinations.

What’s next?

We’re just scratching the surface in this post. To get your hands dirty with Vector check out:

All of Vector’s sources, transforms, and sinks.
The Vector Remap Language, the heart of data processing in Vector.
More details on component configuration in Vector.
Finally, deploying Vector to launch Vector in your production environment.