AWS S3 Source

The Vector aws_s3 source collects logs from AWS S3.

Configuration

[sources.my_source_id]
# General
type = "aws_s3" # required
region = "us-east-1" # required, required when endpoint = null
# Sqs
sqs.delete_message = true # optional, default
sqs.poll_secs = 15 # optional, default, seconds
sqs.queue_url = "https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue" # required
  • optionalstring

    assume_role

    The ARN of an IAM role to assume at startup. See AWS Authentication for more info.

    • View examples
  • enumoptionalstring

    compression

    The compression format of the S3 objects..

    • Default: "text"
    • Enum, must be one of: "auto" "gzip" "zstd" "none"
    • View examples
  • optionalstring

    endpoint

    Custom endpoint for use with AWS-compatible services. Providing a value for this option will make region moot.

    • Only relevant when: region = null
    • View examples
  • optionaltable

    multiline

    Multiline parsing configuration. If not specified, multiline parsing is disabled. See Handling events from the aws_s3 source for more info.

    • commonrequiredstring

      condition_pattern

      Condition regex pattern to look for. Exact behavior is configured via mode.

      • View examples
    • enumcommonrequiredstring

      mode

      Mode of operation, specifies how the condition_pattern is interpreted.

      • Enum, must be one of: "continue_through" "continue_past" "halt_before" "halt_with"
      • View examples
    • commonrequiredstring

      start_pattern

      Start regex pattern to look for as a beginning of the message.

      • View examples
    • commonrequireduint

      timeout_ms

      The maximum time to wait for the continuation. Once this timeout is reached, the buffered message is guaranteed to be flushed, even if incomplete.

      • View examples
  • commonrequired*string

    region

    The AWS region of the target service. If endpoint is provided it will override this value since the endpoint includes the region.

    • Only required when: endpoint = null
    • View examples
  • commonoptionaltable

    sqs

    SQS strategy options. Required if strategy=sqs.

    • commonoptionalbool

      delete_message

      Whether to delete the message once Vector processes it. It can be useful to set this to false to debug or during initial Vector setup.

      • Default: true
      • View examples
    • commonoptionaluint

      poll_secs

      How often to poll the queue for new messages in seconds.

      • Default: 15 (seconds)
    • commonrequiredstring

      queue_url

      The URL of the SQS queue to receieve bucket notifications from.

      • View examples
    • optionaluint

      visibility_timeout_secs

      The visibility timeout to use for messages in secords. This controls how long a message is left unavailable when a Vector receives it. If a vector does not delete the message before the timeout expires, it will be made reavailable for another consumer; this can happen if, for example, the vector process crashes.

      • WARNING: Should be set higher than the length of time it takes to process an individual message to avoid that message being reprocessed.
      • Default: 300 (seconds)
  • enumoptionalstring

    strategy

    The strategy to use to consume objects from AWS S3.

    • Default: "sqs"
    • Enum, must be one of: "sqs"

Env Vars

  • commonoptionalstring

    AWS_ACCESS_KEY_ID

    The AWS access key id. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.

    • View examples
  • commonoptionalstring

    AWS_CONFIG_FILE

    Specifies the location of the file that the AWS CLI uses to store configuration profiles.

    • Default: "~/.aws/config"
  • commonoptionalstring

    AWS_CREDENTIAL_EXPIRATION

    Expiration time in RFC 3339 format. If unset, credentials won't expire.

    • View examples
  • commonoptionalstring

    AWS_DEFAULT_REGION

    The default AWS region.

    • Only relevant when: endpoint = null
    • View examples
  • commonoptionalstring

    AWS_PROFILE

    Specifies the name of the CLI profile with the credentials and options to use. This can be the name of a profile stored in a credentials or config file.

    • Default: "default"
    • View examples
  • commonoptionalstring

    AWS_ROLE_SESSION_NAME

    Specifies a name to associate with the role session. This value appears in CloudTrail logs for commands performed by the user of this profile.

    • View examples
  • commonoptionalstring

    AWS_SECRET_ACCESS_KEY

    The AWS secret access key. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.

    • View examples
  • commonoptionalstring

    AWS_SESSION_TOKEN

    The AWS session token. Used for AWS authentication when communicating with AWS services.

    • View examples
  • commonoptionalstring

    AWS_SHARED_CREDENTIALS_FILE

    Specifies the location of the file that the AWS CLI uses to store access keys.

    • Default: "~/.aws/credentials"

Output

This component outputs log events with the following fields:

{
"bucket" : "my-bucket",
"message" : "53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] \"GET /disintermediate HTTP/2.0\" 401 20308",
"object" : "AWSLogs/111111111111/vpcflowlogs/us-east-1/2020/10/26/111111111111_vpcflowlogs_us-east-1_fl-0c5605d9f1baf680d_20201026T1950Z_b1ea4a7a.log.gz",
"region" : "us-east-1",
"timestamp" : "2020-10-10T17:07:36+00:00"
}
  • commonrequiredstring

    bucket

    The bucket of the object the line came from.

    • View examples
  • commonrequiredstring

    message

    A line from the S3 object.

    • View examples
  • commonrequiredstring

    object

    The object the line came from.

    • View examples
  • commonrequiredstring

    region

    The AWS region bucket is in.

    • View examples
  • commonrequiredtimestamp

    timestamp

    The Last-Modified time of the object. Defaults the current timestamp if this information is missing.

    • View examples

Telemetry

This component provides the following metrics that can be retrieved through the internal_metrics source. See the metrics section in the monitoring page for more info.

  • counter

    sqs_message_delete_failed_total

    The total number of failures to delete SQS messages. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_message_delete_succeeded_total

    The total number of successful deletions of SQS messages. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_message_processing_failed_total

    The total number of failures to process SQS messages. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_message_processing_succeeded_total

    The total number of SQS messages successfully processed. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_message_receive_failed_total

    The total number of failures to receive SQS messages. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_message_receive_succeeded_total

    The total number of times successfully receiving SQS messages. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_message_received_messages_total

    The total number of received SQS messages. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    processed_events_total

    The total number of events processed by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • file - The file that produced the error

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    sqs_s3_event_record_ignored_total

    The total number of times an S3 record in an SQS message was ignored (for an event that was not ObjectCreated). This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • ignore_type - The reason for ignoring the S3 record

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    processed_bytes_total

    The total number of bytes processed by the component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

How It Works

AWS Authentication

Vector checks for AWS credentials in the following order:

  1. Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
  2. The credential_process command in the AWS config file. (usually located at ~/.aws/config)
  3. The AWS credentials file. (usually located at ~/.aws/credentials)
  4. The IAM instance profile. (will only work if running on an EC2 instance with an instance profile/role)

If credentials are not found the healtcheck will fail and an error will be logged.

Obtaining an access key

In general, we recommend using instance profiles/roles whenever possible. In cases where this is not possible you can generate an AWS access key for any user within your AWS account. AWS provides a detailed guide on how to do this.

Assuming roles

Vector can assume an AWS IAM role via the assume_role option. This is an optional setting that is helpful for a variety of use cases, such as cross account access.

Context

By default, the aws_s3 source will augment events with helpful context keys as shown in the "Output" section.

Handling events from the aws_s3 source

This source behaves very similarly to the file source in that it will output one event per line (unless the multiline configuration option is used).

You will commonly want to use transforms to parse the data. For example, to parse VPC flow logs sent to S3 you can chain the tokenizer transform:

[transforms.flow_logs]
type = "tokenizer" # required
inputs = ["s3"]
field_names = ["version", "account_id", "interface_id", "srcaddr", "dstaddr", "srcport", "dstport", "protocol", "packets", "bytes", "start", "end", "action", "log_status"]
types.srcport = "int"
types.dstport = "int"
types.packets = "int"
types.bytes = "int"
types.start = "timestamp|%s"
types.end = "timestamp|%s"

To parse AWS load balancer logs, the regex_parser transform can be used:

[transforms.elasticloadbalancing_fields_parsed]
type = "regex_parser"
inputs = ["s3"]
regex = '(?x)^
(?P<type>[\w]+)[ ]
(?P<timestamp>[\w:.-]+)[ ]
(?P<elb>[^\s]+)[ ]
(?P<client_host>[\d.:-]+)[ ]
(?P<target_host>[\d.:-]+)[ ]
(?P<request_processing_time>[\d.-]+)[ ]
(?P<target_processing_time>[\d.-]+)[ ]
(?P<response_processing_time>[\d.-]+)[ ]
(?P<elb_status_code>[\d-]+)[ ]
(?P<target_status_code>[\d-]+)[ ]
(?P<received_bytes>[\d-]+)[ ]
(?P<sent_bytes>[\d-]+)[ ]
"(?P<request_method>[\w-]+)[ ]
(?P<request_url>[^\s]+)[ ]
(?P<request_protocol>[^"\s]+)"[ ]
"(?P<user_agent>[^"]+)"[ ]
(?P<ssl_cipher>[^\s]+)[ ]
(?P<ssl_protocol>[^\s]+)[ ]
(?P<target_group_arn>[\w.:/-]+)[ ]
"(?P<trace_id>[^\s"]+)"[ ]
"(?P<domain_name>[^\s"]+)"[ ]
"(?P<chosen_cert_arn>[\w:./-]+)"[ ]
(?P<matched_rule_priority>[\d-]+)[ ]
(?P<request_creation_time>[\w.:-]+)[ ]
"(?P<actions_executed>[\w,-]+)"[ ]
"(?P<redirect_url>[^"]+)"[ ]
"(?P<error_reason>[^"]+)"'
field = "message"
drop_failed = false
types.received_bytes = "int"
types.request_processing_time = "float"
types.sent_bytes = "int"
types.target_processing_time = "float"
types.response_processing_time = "float"
[transforms.elasticloadbalancing_url_parsed]
type = "regex_parser"
inputs = ["elasticloadbalancing_fields_parsed"]
regex = '^(?P<url_scheme>[\w]+)://(?P<url_hostname>[^\s:/?#]+)(?::(?P<request_port>[\d-]+))?-?(?:/(?P<url_path>[^\s?#]*))?(?P<request_url_query>\?[^\s#]+)?'
field = "request_url"
drop_failed = false