When Vector serves as a service, its purpose is to efficiently receive, aggregate, and route data downstream. In this scenario, Vector is the primary service on the host and should take full advantage of all resources.
Vector is designed, by default, to take full advantage of all system resources, which is usually preferred in the service role. As a result, there is nothing special you need to do to improve performance.
To ensure Vector does not lose data between restarts you'll need to switch
the buffer to use the disk for all relevant sinks. This can be accomplished
by adding a simple
[buffer] table to each of your configured sinks. In
addition, we recommend specifying an explicit
data_dir for Vector's buffer.
data_dir = "/var/lib/vector"[sinks.backups]type = "s3"# ...[sinks.backups.buffer]type = "disk"max_size = 5000000000 # 5gb
Please note that there is a performance hit to enabling on-disk buffers of about 3X. We believe this to be a worthwhile tradeoff to ensure data is not lost across restarts.
By default Vector is tuned for performance, there are no extra system level configuration steps necessary to improve performance.
The hardware needed is highly dependent on your configuration and data volume. Typically, Vector is CPU bound and not memory bound, especially if all buffers are configured to use the disk. Our benchmarks should give you a general idea of resource usage in relation to specific pipelines and data volume.
Vector benefits greatly from parallel processing, the more cores the better.
For example, if you're on AWS, the
c5d.* instances will give you the most
bang for your buck given their optimization towards CPU and the fact that
they include a fast NVME drive for on-disk buffers.
If you've configured on-disk buffers,
then memory should not be your bottleneck. If you opted to keep buffers
in-memory, then you'll want to make sure you have at least 2X your cumulative
buffer size. For example, if you have an
configured to use 100mb and 1gb, then you should ensure you have at least
2.2gb (1.1 * 2) of memory available.
If you've configured on-disk buffers, then we recommend using local NVMe SSD
drives when possible. This will ensure disk IO does not become your bottleneck.
For example, if you're on AWS you'll want to choose an instance that includes a
local NVME SSD drive, such as the
c5d.* instances. The size of the disk should
be at least 3 times your cumulative buffer size.
If you've configured Vector to receive data over the network then you'll
benefit from load balancing. Select sinks offer built-in load balancing,
such as the
vector sinks. This is a very rudimentary form of load
balancing that requires all clients to know about the available downstream
hosts. A more formal load balancing strategy is outside of the scope of this
document, but is typically achieved by services such as
AWS' ELB, Haproxy, Nginx, and more.
Vector can be reloaded to apply configuration changes. This is the recommended strategy and should be used over restarting when possible.