Log Namespacing

Changing Vector’s data model

4 min read
Author photo for

The Vector team has been hard at work improving the data model of events in Vector. These changes are now available for beta testing for those who want to try it out and give feedback. This is an opt-in feature. Nothing should change unless you specifically enable it.

Why

Currently, all data for events is placed at the root of the event, regardless of where the data came from or how it was obtained. Not only can that make it confusing to understand what a certain field represents (eg: was the timestamp field generated by Vector when it was ingested, or is it when the source originally created the event) but it can easily cause data collisions.

Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking of events in Vector.

How to enable

The global config schema.log_namespace can be set to true to enable the new Log Namespacing feature for all components. The default is false.

Every source also has a log_namespace config option. This will override the global setting, so you can try out Log Namespacing on individual sources.

The following example enables the log_namespace feature globally, then disables it for a single source.

schema.log_namespace = true

[sources.input_with_log_namespace]
type = "demo_logs"
format = "shuffle"
lines = ["input_with_log_namespace"]
interval = 1

[sources.input_without_log_namespace]
type = "demo_logs"
format = "shuffle"
lines = ["input_without_log_namespace"]
interval = 1
log_namespace = false

[sinks.console]
type = "console"
inputs = ["input_with_log_namespace", "input_without_log_namespace"]
encoding.codec = "json"

How It Works

Data Layout

When handling log events, information is categorized into one of the following groups: (Examples are from the datadog_agent source)

  • Event Data: The decoded event data. (eg: the log itself)
  • Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags)
  • Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event)

Without Log Namespacing

All three of these are placed at the root of the event. The exact layout depends on the source, some fields are configurable, and the global log schema can change the name / location of some fields.

Example log event from the datadog_agent source (with the JSON decoder)

{
  "ddsource": "vector",
  "ddtags": "env:prod",
  "hostname": "alpha",
  "foo": "foo field",
  "service": "cernan",
  "source_type": "datadog_agent",
  "bar": "bar field",
  "status": "warning",
  "timestamp": "1970-02-14T20:44:57.570Z"
}

With Log Namespacing

When enabled, the layout of this data is well-defined and consistent.

Event Data (and only Event Data) is placed at the root of the event (eg: .). Source metadata is placed in event metadata, prefixed by the source name. (eg: %datadog_agent) Vector metadata is placed in event metadata, prefixed by vector. (eg: %vector)

Generally sinks will only send the event data. If you want to include any metadata fields, it’s recommended to use a remap transform to add data to the event as needed.

It’s important to note that previously the type of an event (.) would always be an object with fields. Now it is possible for event to be any type, such as a string.

Example log event from the datadog agent source. (same data as the example above)

Event root (.)

{
  "foo": "foo field",
  "bar": "bar field"
}

Source metadata fields (%datadog_agent)

{
  "ddsource": "vector",
  "ddtags": "env:prod",
  "hostname": "alpha",
  "service": "cernan",
  "status": "warning",
  "timestamp": "1970-02-14T20:44:57.570Z"
}

Source vector fields (%vector)

{
  "source_type": "datadog_agent",
  "ingest_timestamp": "1970-02-14T20:44:58.236Z"
}

Here is a sample VRL script accessing different parts of an event when log namespacing is enabled.

event = .
field_from_event = .foo

all_metadata = %
tags = %datadog_agent.ddtags
timestamp = %vector.ingest_timestamp

Semantic Meaning

Before Log Namespacing, Vector used the global log schema to keep certain types of information at known locations. This is changing, and when log namespacing is enabled, the global log schema will no longer be used. To replace it, a new feature called “semantic meaning” will be used instead. This allows assigning meaning to different fields of an event, which allows sinks to access information needed, such as timestamps, hostname, the message, etc.

Semantic meaning will automatically be assigned by all sources. Sinks will check on startup to make sure a meaning exists for all required fields. If a source does not provide a required field, or a meaning needs to be manually adjusted for any reason, the VRL function set_semantic_meaning can be used.