GCP Cloud Storage (GCS) Sink

The Vector gcp_cloud_storage sink sends logs to GCP Cloud Storage.

Configuration

[sinks.my_sink_id]
# General
type = "gcp_cloud_storage" # required
inputs = ["my-source-or-transform-id"] # required
bucket = "my-bucket" # required
compression = "none" # optional, default
healthcheck = true # optional, default
# Auth
credentials_path = "/path/to/credentials.json" # optional, no default
# Encoding
encoding.codec = "ndjson" # required
# File Naming
key_prefix = "date=%F/" # optional, default
  • enumoptionalstring

    acl

    Predefined ACL to apply to the created objects. For more information, see Predefined ACLs. If this is not set, GCS will apply a default ACL when the object is created. See Object access control list (ACL) for more info.

    • Enum, must be one of: "authenticated-read" "bucket-owner-full-control" "bucket-owner-read" "private" "project-private" "public-read"
    • View examples
  • optionaltable

    batch

    Configures the sink batching behavior.

    • commonoptionaluint

      max_bytes

      The maximum size of a batch, in bytes, before it is flushed.

      • Default: 10485760 (bytes)
    • commonoptionaluint

      timeout_secs

      The maximum age of a batch before it is flushed. See Buffers & batches for more info.

      • Default: 300 (seconds)
  • commonrequiredstring

    bucket

    The GCS bucket name.

    • View examples
  • optionaltable

    buffer

    Configures the sink specific buffer behavior.

    • commonoptionaluint

      max_events

      The maximum number of events allowed in the buffer. See Buffers & batches for more info.

      • Only relevant when: type = "memory"
      • Default: 500 (events)
    • commonrequired*uint

      max_size

      The maximum size of the buffer on the disk. See Buffers & batches for more info.

      • Only required when: type = "disk"
      • View examples
    • enumcommonoptionalstring

      type

      The buffer's type and storage mechanism.

      • Default: "memory"
      • Enum, must be one of: "memory" "disk"
      • View examples
    • enumoptionalstring

      when_full

      The behavior when the buffer becomes full.

      • Default: "block"
      • Enum, must be one of: "block" "drop_newest"
      • View examples
  • enumcommonoptionalstring

    compression

    The compression strategy used to compress the encoded event data before transmission.

    • Default: "none"
    • Enum, must be one of: "gzip"
    • View examples
  • commonoptionalstring

    credentials_path

    The filename for a Google Cloud service account credentials JSON file used to authenticate access to the Cloud Storage API. If this is unset, Vector checks the GOOGLE_APPLICATION_CREDENTIALS environment variable for a filename.

    If no filename is named, Vector will attempt to fetch an instance service account for the compute instance the program is running on. If Vector is not running on a GCE instance, you must define a credentials file as above. See GCP Authentication for more info.

    • View examples
  • commonrequiredtable

    encoding

    Configures the encoding specific sink behavior.

    • commonrequiredstring

      codec

      The encoding codec used to serialize the events before outputting.

      • View examples
    • optional[string]

      except_fields

      Prevent the sink from encoding the specified labels.

      • View examples
    • optional[string]

      only_fields

      Prevent the sink from encoding the specified labels.

      • View examples
    • enumoptionalstring

      timestamp_format

      How to format event timestamps.

      • Default: "rfc3339"
      • Enum, must be one of: "rfc3339" "unix"
      • View examples
  • optionalbool

    filename_append_uuid

    Whether or not to append a UUID v4 token to the end of the file. This ensures there are no name collisions high volume use cases. See Object Naming for more info.

    • Default: true
    • View examples
  • optionalstring

    filename_extension

    The filename extension to use in the object name.

    • Default: "log"
  • optionalstring

    filename_time_format

    The format of the resulting object file name. strftime specifiers are supported. See Object Naming for more info.

    • Default: "%s"
  • commonoptionalbool

    healthcheck

    Enables/disables the sink healthcheck upon Vector boot. See Health checks for more info.

    • Default: true
    • View examples
  • templateablecommonoptionalstring

    key_prefix

    A prefix to apply to all object key names. This should be used to partition your objects, and it's important to end this value with a / if you want this to be the root GCS "folder". See Object Naming for more info.

    • Default: "date=%F/"
    • View examples
  • optionalstring

    metadata

    The set of metadata key:value pairs for the created objects. See the GCS custom metadata documentation for more details.

    • optionaltable

      request

      Configures the sink request behavior.

      • optionaltable

        adaptive_concurrency

        Configure the adaptive concurrency algorithms. These values have been tuned by optimizing simulated results. In general you should not need to adjust these.

        • optionalfloat
          decrease_ratio

          The fraction of the current value to set the new concurrency limit when decreasing the limit. Valid values are greater than 0 and less than 1. Smaller values cause the algorithm to scale back rapidly when latency increases. Note that the new limit is rounded down after applying this ratio.

          • Default: 0.9
        • optionalfloat
          ewma_alpha

          The adaptive concurrency algorithm uses an exponentially weighted moving average (EWMA) of past RTT measurements as a reference to compare with the current RTT. This value controls how heavily new measurements are weighted compared to older ones. Valid values are greater than 0 and less than 1. Smaller values cause this reference to adjust more slowly, which may be useful if a service has unusually high response variability.

          • Default: 0.7
        • optionalfloat
          rtt_threshold_ratio

          When comparing the past RTT average to the current measurements, we ignore changes that are less than this ratio higher than the past RTT. Valid values are greater than or equal to 0. Larger values cause the algorithm to ignore larger increases in the RTT.

          • Default: 0.05
      • commonoptionaluint

        concurrency

        The maximum number of in-flight requests allowed at any given time, or "auto" to allow Vector to automatically set the limit based on current network and service conditions.

        • Default: 25 (requests)
      • commonoptionaluint

        rate_limit_duration_secs

        The time window, in seconds, used for the rate_limit_num option.

        • Default: 1 (seconds)
      • commonoptionaluint

        rate_limit_num

        The maximum number of requests allowed within the rate_limit_duration_secs time window.

        • Default: 1000
      • optionaluint

        retry_attempts

        The maximum number of retries to make for failed requests. The default, for all intents and purposes, represents an infinite number of retries.

        • Default: 18446744073709552000
      • optionaluint

        retry_initial_backoff_secs

        The amount of time to wait before attempting the first retry for a failed request. Once, the first retry has failed the fibonacci sequence will be used to select future backoffs.

        • Default: 1 (seconds)
      • optionaluint

        retry_max_duration_secs

        The maximum amount of time, in seconds, to wait between retries.

        • Default: 10 (seconds)
      • commonoptionaluint

        timeout_secs

        The maximum time a request can take before being aborted. It is highly recommended that you do not lower this value below the service's internal timeout, as this could create orphaned requests, pile on retries, and result in duplicate data downstream. See Buffers & batches for more info.

        • Default: 60 (seconds)
    • enumoptionalstring

      storage_class

      The storage class for the created objects. See the GCP storage classes for more details. See Storage Class for more info.

      • Enum, must be one of: "STANDARD" "NEARLINE" "COLDLINE" "ARCHIVE"
      • View examples
    • optionaltable

      tls

      Configures the TLS options for incoming connections.

      • optionalstring

        ca_file

        Absolute path to an additional CA certificate file, in DER or PEM format (X.509), or an inline CA certificate in PEM format.

        • View examples
      • commonoptionalstring

        crt_file

        Absolute path to a certificate file used to identify this connection, in DER or PEM format (X.509) or PKCS#12, or an inline certificate in PEM format. If this is set and is not a PKCS#12 archive, key_file must also be set.

        • View examples
      • commonoptionalstring

        key_file

        Absolute path to a private key file used to identify this connection, in DER or PEM format (PKCS#8), or an inline private key in PEM format. If this is set, crt_file must also be set.

        • View examples
      • optionalstring

        key_pass

        Pass phrase used to unlock the encrypted key file. This has no effect unless key_file is set.

        • View examples
      • optionalbool

        verify_hostname

        If true (the default), Vector will validate the configured remote host name against the remote host's TLS certificate. Do NOT set this to false unless you understand the risks of not verifying the remote hostname.

        • Default: true
        • View examples

    Env Vars

    • commonoptionalstring

      GOOGLE_APPLICATION_CREDENTIALS

      The filename for a Google Cloud service account credentials JSON file used for authentication. See GCP Authentication for more info.

      • Only relevant when: endpoint = null
      • View examples

    Telemetry

    This component provides the following metrics that can be retrieved through the internal_metrics source. See the metrics section in the monitoring page for more info.

    • counter

      processed_events_total

      The total number of events processed by this component. This metric includes the following tags:

      • component_kind - The Vector component kind.

      • component_name - The Vector component ID.

      • component_type - The Vector component type.

      • file - The file that produced the error

      • instance - The Vector instance identified by host and port.

      • job - The name of the job producing Vector metrics.

    • counter

      processed_bytes_total

      The total number of bytes processed by the component. This metric includes the following tags:

      • component_kind - The Vector component kind.

      • component_name - The Vector component ID.

      • component_type - The Vector component type.

      • instance - The Vector instance identified by host and port.

      • job - The name of the job producing Vector metrics.

    How It Works

    Buffers & batches

    This component buffers & batches data as shown in the diagram above. You'll notice that Vector treats these concepts differently, instead of treating them as global concepts, Vector treats them as sink specific concepts. This isolates sinks, ensuring services disruptions are contained and delivery guarantees are honored.

    Batches are flushed when 1 of 2 conditions are met:

    1. The batch age meets or exceeds the configured timeout_secs.
    2. The batch size meets or exceeds the configured <% if component.options.batch.children.respond_to?(:max_size) %>max_size<% else %>max_events<% end %>.

    Buffers are controlled via the buffer.* options.

    GCP Authentication

    GCP offers a variety of authentication methods and Vector is concerned with the server to server methods and will find credentials in the following order:

    1. If the credentials_path option is set.
    2. If the api_key option is set.
    3. If the GOOGLE_APPLICATION_CREDENTIALS envrionment variable is set.
    4. Finally, Vector will check for an instance service account.

    If credentials are not found the healtcheck will fail and an error will be logged.

    Health checks

    Health checks ensure that the downstream service is accessible and ready to accept data. This check is performed upon sink initialization. If the health check fails an error will be logged and Vector will proceed to start.

    Require health checks

    If you'd like to exit immediately upon a health check failure, you can pass the --require-healthy flag:

    vector --config /etc/vector/vector.toml --require-healthy

    Disable health checks

    If you'd like to disable health checks for this sink you can set the healthcheck option to false.

    Object Naming

    By default, Vector will name your GCS objects in the following format:

    <key_prefix><timestamp>-<uuidv4>.log

    For example:

    date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log

    Vector appends a UUIDV4 token to ensure there are no name conflicts in the unlikely event 2 Vector instances are writing data at the same time.

    You can control the resulting name via the key_prefix, filename_time_format, and filename_append_uuid options.

    Object access control list (ACL)

    GCP Cloud Storage supports access control lists (ACL) for buckets and objects. In the context of Vector, only object ACLs are relevant (Vector does not create or modify buckets). You can set the object level ACL by using the acl option, which allows you to set one of the predefined ACLs on each created object.

    Partitioning

    Vector supports dynamic configuration values through a simple template syntax. If an option supports templating, it will be noted with a badge and you can use event fields to create dynamic values. For example:

    vector.toml
    [sinks.my-sink]
    dynamic_option = "application={{ application_id }}"

    In the above example, the application_id for each event will be used to partition outgoing data.

    Rate limits & adapative concurrency

    Adaptive Request Concurrency (ARC)

    Adaptive Requst Concurrency is a feature of Vector that does away with static rate limits and automatically optimizes HTTP concurrency limits based on downstream service responses. The underlying mechanism is a feedback loop inspired by TCP congestion control algorithms. Checkout the announcement blog post,

    We highly recommend enabling this feature as it improves performance and reliability of Vector and the systems it communicates with.

    To enable, set the request.concurrency option to adaptive:

    vector.toml
    [sinks.my-sink]
    request.concurrency = "adaptive"

    Static rate limits

    If Adaptive Request Concurrency is not for you, you can manually set static rate limits with the request.rate_limit_duration_secs, request.rate_limit_num, and request.concurrency options:

    vector.toml
    [sinks.my-sink]
    request.rate_limit_duration_secs = 1
    request.rate_limit_num = 10
    request.concurrency = 10

    Retry policy

    Vector will retry failed requests (status == 429, >= 500, and != 501). Other responses will not be retried. You can control the number of retry attempts and backoff rate with the request.retry_attempts and request.retry_backoff_secs options.

    Storage Class

    GCS offers storage classes. You can apply defaults, and rules, at the bucket level or set the storage class at the object level. In the context of Vector only the object level is relevant (Vector does not create or modify buckets). You can set the storage class via the storage_class option.

    Tags & Metadata

    Vector supports adding custom metadata to created objects. These metadata items are a way of associating extra data items with the object that are not part of the uploaded data.

    Transport Layer Security (TLS)

    Vector uses Openssl for TLS protocols for it's maturity. You can enable and adjust TLS behavior via the tls.* options.