AWS S3 Source
The Vector aws_s3
source
collects logs from AWS S3.
Requirements
Configuration
- Common
- Advanced
- vector.toml
- vector.yaml
- vector.json
[sources.my_source_id]# Generaltype = "aws_s3" # requiredregion = "us-east-1" # required, required when endpoint = null# Sqssqs.delete_message = true # optional, defaultsqs.poll_secs = 15 # optional, default, secondssqs.queue_url = "https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue" # required
- optionaltable
auth
Options for the authentication strategy.
- optionalstring
access_key_id
The AWS access key id. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.
- Syntax:
literal
- View examples
- Syntax:
- optionalstring
assume_role
The ARN of an IAM role to assume at startup. See AWS Authentication for more info.
- Syntax:
literal
- View examples
- Syntax:
- optionalstring
secret_access_key
The AWS secret access key. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.
- Syntax:
literal
- View examples
- Syntax:
- enumoptionalstring
compression
The compression format of the S3 objects..
- Syntax:
literal
- Default:
"text"
- Enum, must be one of:
"auto"
"gzip"
"zstd"
"none"
- View examples
- Syntax:
- optionalstring
endpoint
Custom endpoint for use with AWS-compatible services. Providing a value for this option will make
region
moot.- Syntax:
literal
- Only relevant when: region = null
- View examples
- Syntax:
- optionaltable
multiline
Multiline parsing configuration. If not specified, multiline parsing is disabled. See Handling events from the
aws_s3
source for more info.- commonrequiredstring
condition_pattern
Condition regex pattern to look for. Exact behavior is configured via
mode
.This field accepts a valid [Rust regular expression]urls.rustregex_syntax. Wrapping
/
characters are _not required or permitted.- Syntax:
regex
- View examples
- Syntax:
- enumcommonrequiredstring
mode
Mode of operation, specifies how the
condition_pattern
is interpreted.- Syntax:
literal
- Enum, must be one of:
"continue_through"
"continue_past"
"halt_before"
"halt_with"
- View examples
- Syntax:
- commonrequiredstring
start_pattern
Start regex pattern to look for as a beginning of the message.
This field accepts a valid [Rust regular expression]urls.rustregex_syntax. Wrapping
/
characters are _not required or permitted.- Syntax:
regex
- View examples
- Syntax:
- commonrequireduint
timeout_ms
The maximum time to wait for the continuation. Once this timeout is reached, the buffered message is guaranteed to be flushed, even if incomplete.
- View examples
- commonrequired*string
region
The AWS region of the target service. If
endpoint
is provided it will override this value since the endpoint includes the region.- Syntax:
literal
- Only required when: endpoint = null
- View examples
- Syntax:
- commonoptionaltable
sqs
SQS strategy options. Required if strategy=
sqs
.- commonoptionalbool
delete_message
Whether to delete the message once Vector processes it. It can be useful to set this to
false
to debug or during initial Vector setup.- Default:
true
- View examples
- Default:
- commonoptionaluint
poll_secs
How often to poll the queue for new messages in seconds.
- Default:
15
(seconds)
- Default:
- commonrequiredstring
queue_url
The URL of the SQS queue to receieve bucket notifications from.
- Syntax:
literal
- View examples
- Syntax:
- optionaluint
visibility_timeout_secs
The visibility timeout to use for messages in secords. This controls how long a message is left unavailable when a Vector receives it. If a
vector
does not delete the message before the timeout expires, it will be made reavailable for another consumer; this can happen if, for example, thevector
process crashes.- WARNING: Should be set higher than the length of time it takes to process an individual message to avoid that message being reprocessed.
- Default:
300
(seconds)
- enumoptionalstring
strategy
The strategy to use to consume objects from AWS S3.
- Syntax:
literal
- Default:
"sqs"
- Enum, must be one of:
"sqs"
- Syntax:
Env Vars
- commonoptionalstring
AWS_ACCESS_KEY_ID
The AWS access key id. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
AWS_CONFIG_FILE
Specifies the location of the file that the AWS CLI uses to store configuration profiles.
- Syntax:
literal
- Default:
"~/.aws/config"
- Syntax:
- commonoptionalstring
AWS_CREDENTIAL_EXPIRATION
Expiration time in RFC 3339 format. If unset, credentials won't expire.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
AWS_DEFAULT_REGION
The default AWS region.
- Syntax:
literal
- Only relevant when: endpoint = null
- View examples
- Syntax:
- commonoptionalstring
AWS_PROFILE
Specifies the name of the CLI profile with the credentials and options to use. This can be the name of a profile stored in a credentials or config file.
- Syntax:
literal
- Default:
"default"
- View examples
- Syntax:
- commonoptionalstring
AWS_ROLE_SESSION_NAME
Specifies a name to associate with the role session. This value appears in CloudTrail logs for commands performed by the user of this profile.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
AWS_SECRET_ACCESS_KEY
The AWS secret access key. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
AWS_SESSION_TOKEN
The AWS session token. Used for AWS authentication when communicating with AWS services.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
AWS_SHARED_CREDENTIALS_FILE
Specifies the location of the file that the AWS CLI uses to store access keys.
- Syntax:
literal
- Default:
"~/.aws/credentials"
- Syntax:
Output
This component outputs log events with the following fields:
{"bucket" : "my-bucket","message" : "53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] \"GET /disintermediate HTTP/2.0\" 401 20308","object" : "AWSLogs/111111111111/vpcflowlogs/us-east-1/2020/10/26/111111111111_vpcflowlogs_us-east-1_fl-0c5605d9f1baf680d_20201026T1950Z_b1ea4a7a.log.gz","region" : "us-east-1","timestamp" : "2020-10-10T17:07:36+00:00"}
- commonrequiredstring
bucket
The bucket of the object the line came from.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredstring
message
A line from the S3 object.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredstring
object
The object the line came from.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredstring
region
The AWS region bucket is in.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredtimestamp
timestamp
The Last-Modified time of the object. Defaults the current timestamp if this information is missing.
- View examples
Telemetry
This component provides the following metrics that can be retrieved through
the internal_metrics
source. See the
metrics section in the
monitoring page for more info.
- counter
events_in_total
The number of events accepted by this component either from tagged origin like file and uri, or cumulatively from other origins. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.container_name
- The name of the container from which the event originates.file
- The file from which the event originates.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.mode
- The connection mode used by the component.peer_addr
- The IP from which the event originates.peer_path
- The pathname from which the event originates.pod_name
- The name of the pod from which the event originates.uri
- The sanitized uri from which the event originates.
- counter
processed_bytes_total
The number of bytes processed by the component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.container_name
- The name of the container from which the bytes originate.file
- The file from which the bytes originate.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.mode
- The connection mode used by the component.peer_addr
- The IP from which the bytes originate.peer_path
- The pathname from which the bytes originate.pod_name
- The name of the pod from which the bytes originate.uri
- The sanitized uri from which the bytes originate.
- counter
sqs_message_delete_failed_total
The total number of failures to delete SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_delete_succeeded_total
The total number of successful deletions of SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_processing_failed_total
The total number of failures to process SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_processing_succeeded_total
The total number of SQS messages successfully processed. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_receive_failed_total
The total number of failures to receive SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_receive_succeeded_total
The total number of times successfully receiving SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_received_messages_total
The total number of received SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
events_out_total
The total number of events emitted by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_s3_event_record_ignored_total
The total number of times an S3 record in an SQS message was ignored (for an event that was not
ObjectCreated
). This metric includes the following tags:component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.ignore_type
- The reason for ignoring the S3 recordinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
How It Works
AWS Authentication
Vector checks for AWS credentials in the following order:
- Options
access_key_id
andsecret_access_key
. - Environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - The
credential_process
command in the AWS config file. (usually located at~/.aws/config
) - The AWS credentials file. (usually located at
~/.aws/credentials
) - The IAM instance profile. (will only work if running on an EC2 instance with an instance profile/role)
If credentials are not found the healtcheck will fail and an error will be logged.
Obtaining an access key
In general, we recommend using instance profiles/roles whenever possible. In
cases where this is not possible you can generate an AWS access key for any user
within your AWS account. AWS provides a detailed guide on
how to do this. Such created AWS access keys can be used via access_key_id
and secret_access_key
options.
Assuming roles
Vector can assume an AWS IAM role via the assume_role
option. This is an
optional setting that is helpful for a variety of use cases, such as cross
account access.
Context
By default, the aws_s3
source will augment events with helpful
context keys as shown in the "Output" section.
aws_s3
source
Handling events from the This source behaves very similarly to the file
source in that
it will output one event per line (unless the multiline
configuration option is used).
You will commonly want to use transforms to
parse the data. For example, to parse VPC flow logs sent to S3 you can
chain the tokenizer
transform:
[transforms.flow_logs]type = "tokenizer" # requiredinputs = ["s3"]field_names = ["version", "account_id", "interface_id", "srcaddr", "dstaddr", "srcport", "dstport", "protocol", "packets", "bytes", "start", "end", "action", "log_status"]types.srcport = "int"types.dstport = "int"types.packets = "int"types.bytes = "int"types.start = "timestamp|%s"types.end = "timestamp|%s"
To parse AWS load balancer logs, the regex_parser
transform can be used:
[transforms.elasticloadbalancing_fields_parsed]type = "regex_parser"inputs = ["s3"]regex = '(?x)^(?P<type>[\w]+)[ ](?P<timestamp>[\w:.-]+)[ ](?P<elb>[^\s]+)[ ](?P<client_host>[\d.:-]+)[ ](?P<target_host>[\d.:-]+)[ ](?P<request_processing_time>[\d.-]+)[ ](?P<target_processing_time>[\d.-]+)[ ](?P<response_processing_time>[\d.-]+)[ ](?P<elb_status_code>[\d-]+)[ ](?P<target_status_code>[\d-]+)[ ](?P<received_bytes>[\d-]+)[ ](?P<sent_bytes>[\d-]+)[ ]"(?P<request_method>[\w-]+)[ ](?P<request_url>[^\s]+)[ ](?P<request_protocol>[^"\s]+)"[ ]"(?P<user_agent>[^"]+)"[ ](?P<ssl_cipher>[^\s]+)[ ](?P<ssl_protocol>[^\s]+)[ ](?P<target_group_arn>[\w.:/-]+)[ ]"(?P<trace_id>[^\s"]+)"[ ]"(?P<domain_name>[^\s"]+)"[ ]"(?P<chosen_cert_arn>[\w:./-]+)"[ ](?P<matched_rule_priority>[\d-]+)[ ](?P<request_creation_time>[\w.:-]+)[ ]"(?P<actions_executed>[\w,-]+)"[ ]"(?P<redirect_url>[^"]+)"[ ]"(?P<error_reason>[^"]+)"'field = "message"drop_failed = falsetypes.received_bytes = "int"types.request_processing_time = "float"types.sent_bytes = "int"types.target_processing_time = "float"types.response_processing_time = "float"[transforms.elasticloadbalancing_url_parsed]type = "regex_parser"inputs = ["elasticloadbalancing_fields_parsed"]regex = '^(?P<url_scheme>[\w]+)://(?P<url_hostname>[^\s:/?#]+)(?::(?P<request_port>[\d-]+))?-?(?:/(?P<url_path>[^\s?#]*))?(?P<request_url_query>\?[^\s#]+)?'field = "request_url"drop_failed = false
State
This component is stateless, meaning its behavior is consistent across each input.