Log Namespacing
Changing Vector’s data model
The Vector team has been hard at work improving the data model of events in Vector. These changes are now available for beta testing for those who want to try it out and give feedback. This is an opt-in feature. Nothing should change unless you specifically enable it.
Why
Currently, all data for events is placed at the root of the event, regardless of where the data came
from or how it was obtained. Not only can that make it confusing to understand what a certain field
represents (eg: was the timestamp
field generated by Vector when it was ingested, or is it when
the source originally created the event) but it can easily cause data collisions.
Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking of events in Vector.
How to enable
The global config schema.log_namespace
can be set to true
to enable the new
Log Namespacing feature for all components. The default is false
.
Every source also has a log_namespace
config option. This will override the global setting,
so you can try out Log Namespacing on individual sources.
The following example enables the log_namespace
feature globally, then disables it for a single
source.
schema.log_namespace = true
[sources.input_with_log_namespace]
type = "demo_logs"
format = "shuffle"
lines = ["input_with_log_namespace"]
interval = 1
[sources.input_without_log_namespace]
type = "demo_logs"
format = "shuffle"
lines = ["input_without_log_namespace"]
interval = 1
log_namespace = false
[sinks.console]
type = "console"
inputs = ["input_with_log_namespace", "input_without_log_namespace"]
encoding.codec = "json"
How It Works
Data Layout
When handling log events, information is categorized into one of the following groups:
(Examples are from the datadog_agent
source)
- Event Data: The decoded event data. (eg: the log itself)
- Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags)
- Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event)
Without Log Namespacing
All three of these are placed at the root of the event. The exact layout depends on the source, some fields are configurable, and the global log schema can change the name / location of some fields.
Example log event from the datadog_agent
source (with the JSON decoder)
{
"ddsource": "vector",
"ddtags": "env:prod",
"hostname": "alpha",
"foo": "foo field",
"service": "cernan",
"source_type": "datadog_agent",
"bar": "bar field",
"status": "warning",
"timestamp": "1970-02-14T20:44:57.570Z"
}
With Log Namespacing
When enabled, the layout of this data is well-defined and consistent.
Event Data (and only Event Data) is placed at the root of the event (eg: .
).
Source metadata is placed in event metadata, prefixed by the source name. (eg: %datadog_agent
)
Vector metadata is placed in event metadata, prefixed by vector
. (eg: %vector
)
Generally sinks will only send the event data. If you want to include any metadata fields, it’s recommended to use a remap transform to add data to the event as needed.
It’s important to note that previously the type of an event (.
) would always be an object
with fields. Now it is possible for event to be any type, such as a string.
Example log event from the datadog agent
source. (same data as the example above)
Event root (.
)
{
"foo": "foo field",
"bar": "bar field"
}
Source metadata fields (%datadog_agent
)
{
"ddsource": "vector",
"ddtags": "env:prod",
"hostname": "alpha",
"service": "cernan",
"status": "warning",
"timestamp": "1970-02-14T20:44:57.570Z"
}
Source vector fields (%vector
)
{
"source_type": "datadog_agent",
"ingest_timestamp": "1970-02-14T20:44:58.236Z"
}
Here is a sample VRL script accessing different parts of an event when log namespacing is enabled.
event = .
field_from_event = .foo
all_metadata = %
tags = %datadog_agent.ddtags
timestamp = %vector.ingest_timestamp
Semantic Meaning
Before Log Namespacing, Vector used the global log schema to keep certain types of information at known locations. This is changing, and when log namespacing is enabled, the global log schema will no longer be used. To replace it, a new feature called “semantic meaning” will be used instead. This allows assigning meaning to different fields of an event, which allows sinks to access information needed, such as timestamps, hostname, the message, etc.
Semantic meaning will automatically be assigned by all sources. Sinks will check on startup to make sure a meaning exists for all required fields. If a source does not provide a required field, or a meaning needs to be manually adjusted for any reason, the VRL function set_semantic_meaning can be used.