Vector’s runtime model and how it manages concurrency
Vector’s runtime is a futures-based asynchronous runtime where nodes in Vector’s DAG topology model roughly map to asynchronous tasks that communicate via channels, all scheduled by the Tokio runtime.
Sources are tasks with an output channel. This interface is intentionally simple and favors internal composability to allow for maximum flexibility across Vector’s wide array of sources.
Transforms can both be tasks or stateless functions depending on their purpose.
Stateless function transforms are single operation transforms that do not maintain state across multiple events. For example, the
remap transform performs individual operations on events as they are received and immediately returns. This function-like simplificity allows them to be inlined at the source level to achieve our concurrency model.
Task transforms can optionally maintain state across multiple events. Therefore, they run as separate tasks and cannot be inlined at the source level for concurrency. An example of task transform is the
dedupe transform, which maintains state to drop duplicate events.
Sinks are tasks with an input channel. This interface is intentionally simple and favors internal composability to allow for maximum flexibility. Sinks share a lot of infrastructure that make them easy and flexible to build. Such as streaming, batching, partitioning, networking, retries, and buffers.
Vector uses the Tokio runtime for task scheduling.
Nodes in Vector’s DAG topology communicate via channels. Edge nodes are customized channels with dynamic output control where back pressure is the default, but can be customized on a per-sink basis to shed load or persist to disk.