Concurrency model

Source (socket-based) Connection Parse Add fields Connection Parse Add fields ... Source (file-based) file Parse Add fields file Parse Add fields ... Dedupe Sink

Vector implements a concurrency model that scales naturally with incoming data volume as shown above. Each Vector source is responsible for defining the unit of concurrency and implementing it accordingly. This allows for a natural concurrency model that adapts to however Vector is being used, avoiding the need for tedious concurrency tuning and configuration.

For example, the file source implements concurrency across the number of files it’s tailing, and the socket source implements concurrency across the number active open connection it’s maintaining.

Stateless function transforms

As covered in the pipeline model documentation, Vector’s concurrency relies on stateless function transforms that can be parallelized. Task transforms cannot be parallelized, currently, and so can introduce bottlenecks in processing (we hope to improve this in the future).