Secrets in Disk Buffers
Starting with Vector’s 0.34.0
release, secrets in events will now be stored in disk buffers. These
secrets are stored unencrypted.
Event secrets
For some source components in Vector, such as the Datadog Agent or Splunk HEC sources, these components have the ability to store the API key received in requests by Vector in order to re-use the same API key when sending those events back out to a compatible service. This allows users, for example, to set up Vector as an aggregator for all of their Datadog Agent processes, reusing the original API key, or keys, as the events are then forwarded to the Datadog API.
Disk buffers and event secrets and metadata
Prior to #18816, these event secrets (and other event metadata) were not stored when using disk buffers. This represented a loss of functionality when users switched from the default in-memory buffers to disk buffers. In order to bring this functionality up to par, we added support for storing event secrets/metadata when writing events to disk buffers.
Naturally, event secrets represent sensitive data such as API keys and more. However, Vector currently stores these event secrets unencrypted in disk buffers.
Do I need to worry about this change?
Firstly, if you’re not using disk buffers, then there is no change to Vector’s behavior and you can stop reading here.
There are two main scenarios where a configuration might now start storing secrets in disk buffers:
- When you are using a source component which has the ability to store secrets
- When you are using
remap
and adding secrets directly to events
Source components that can store secrets
Some source components store secrets (specifically, API keys) on an event in order to facilitate Vector acting similarly to a proxy, using as much of the original request/event data as possible. Only two sources currently provide such behavior:
datadog_agent
source (stores theDD-API-KEY
header value; enabled by default)splunk_hec
source (stores theAuthorization
header value; disabled by default)
However, for both of these sinks, this behavior can be disabled by setting store_api_key
to
false
for the datadog_agent
source, or setting store_hec_token
to false
for the splunk_hec
source.
Manually-stored secrets using remap
When using the remap
transform, VRL exposes helper functions to set secrets on events. If your
remap
usage includes setting secrets, then those secrets would also now be in scope for getting
stored in disk buffers.
Securing disk buffers
As mentioned above, secrets will now be stored in disk buffer data files, and will be unencrypted. The data directory that Vector is configured to use should be locked down as tightly as possible so that only the user/group that runs the Vector process has read/write access.
By default on Unix-based platforms, Vector will attempt to set file permissions for the disk buffer directory/files to only be readable/writeable by the process user, and only readable by the process group. This does not occur on Windows.
Future improvements to disk buffers and securely buffering events
This is not the end of the story for storing secrets in disk buffers. We do have tentative plans to eventually support encrypting secrets in disk buffers, and potentially support encrypting all event data itself. This work depends on capabilities Vector does not currently have, such as being able to securely pass a decryption key into the process, and where a long-lived decryption key would live.
These issues need to be tackled first before we can provide a robust encryption solution for disk buffers.