AWS S3 Source
The Vector aws_s3
source
collects logs from AWS S3.
Configuration
- Common
- Advanced
- vector.toml
- vector.yaml
- vector.json
[sources.my_source_id]# Generaltype = "aws_s3" # requiredregion = "us-east-1" # required, required when endpoint = null# Sqssqs.delete_message = true # optional, defaultsqs.poll_secs = 15 # optional, default, secondssqs.queue_url = "https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue" # required
- optionalstring
assume_role
The ARN of an IAM role to assume at startup. See AWS Authentication for more info.
- View examples
- optionalstring
compression
The compression format of the S3 objects..
- Default:
"text"
- Enum, must be one of:
"auto"
"gzip"
"zstd"
"none"
- View examples
- Default:
- optionalstring
endpoint
Custom endpoint for use with AWS-compatible services. Providing a value for this option will make
region
moot.- Only relevant when: region = null
- View examples
- optionaltable
multiline
Multiline parsing configuration. If not specified, multiline parsing is disabled. See Handling events from the
aws_s3
source for more info.- requiredstring
condition_pattern
Condition regex pattern to look for. Exact behavior is configured via
mode
.- View examples
- requiredstring
mode
Mode of operation, specifies how the
condition_pattern
is interpreted.- Enum, must be one of:
"continue_through"
"continue_past"
"halt_before"
"halt_with"
- View examples
- Enum, must be one of:
- requiredstring
start_pattern
Start regex pattern to look for as a beginning of the message.
- View examples
- requireduint
timeout_ms
The maximum time to wait for the continuation. Once this timeout is reached, the buffered message is guaranteed to be flushed, even if incomplete.
- View examples
- required*string
region
The AWS region of the target service. If
endpoint
is provided it will override this value since the endpoint includes the region.- Only required when: endpoint = null
- View examples
- optionaltable
sqs
SQS strategy options. Required if strategy=
sqs
.- optionalbool
delete_message
Whether to delete the message once Vector processes it. It can be useful to set this to
false
to debug or during initial Vector setup.- Default:
true
- View examples
- Default:
- optionaluint
poll_secs
How often to poll the queue for new messages in seconds.
- Default:
15
(seconds)
- Default:
- requiredstring
queue_url
The URL of the SQS queue to receieve bucket notifications from.
- View examples
- optionaluint
visibility_timeout_secs
The visibility timeout to use for messages in secords. This controls how long a message is left unavailable when a Vector receives it. If a
vector
does not delete the message before the timeout expires, it will be made reavailable for another consumer; this can happen if, for example, thevector
process crashes.- WARNING: Should be set higher than the length of time it takes to process an individual message to avoid that message being reprocessed.
- Default:
300
(seconds)
- optionalstring
strategy
The strategy to use to consume objects from AWS S3.
- Default:
"sqs"
- Enum, must be one of:
"sqs"
- Default:
Env Vars
- optionalstring
AWS_ACCESS_KEY_ID
The AWS access key id. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.
- View examples
- optionalstring
AWS_CONFIG_FILE
Specifies the location of the file that the AWS CLI uses to store configuration profiles.
- Default:
"~/.aws/config"
- Default:
- optionalstring
AWS_CREDENTIAL_EXPIRATION
Expiration time in RFC 3339 format. If unset, credentials won't expire.
- View examples
- optionalstring
AWS_DEFAULT_REGION
The default AWS region.
- Only relevant when: endpoint = null
- View examples
- optionalstring
AWS_PROFILE
Specifies the name of the CLI profile with the credentials and options to use. This can be the name of a profile stored in a credentials or config file.
- Default:
"default"
- View examples
- Default:
- optionalstring
AWS_ROLE_SESSION_NAME
Specifies a name to associate with the role session. This value appears in CloudTrail logs for commands performed by the user of this profile.
- View examples
- optionalstring
AWS_SECRET_ACCESS_KEY
The AWS secret access key. Used for AWS authentication when communicating with AWS services. See AWS Authentication for more info.
- View examples
- optionalstring
AWS_SESSION_TOKEN
The AWS session token. Used for AWS authentication when communicating with AWS services.
- View examples
- optionalstring
AWS_SHARED_CREDENTIALS_FILE
Specifies the location of the file that the AWS CLI uses to store access keys.
- Default:
"~/.aws/credentials"
- Default:
Output
This component outputs log events with the following fields:
{"bucket" : "my-bucket","message" : "53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] \"GET /disintermediate HTTP/2.0\" 401 20308","object" : "AWSLogs/111111111111/vpcflowlogs/us-east-1/2020/10/26/111111111111_vpcflowlogs_us-east-1_fl-0c5605d9f1baf680d_20201026T1950Z_b1ea4a7a.log.gz","region" : "us-east-1","timestamp" : "2020-10-10T17:07:36+00:00"}
- requiredstring
bucket
The bucket of the object the line came from.
- View examples
- requiredstring
message
A line from the S3 object.
- View examples
- requiredstring
object
The object the line came from.
- View examples
- requiredstring
region
The AWS region bucket is in.
- View examples
- requiredtimestamp
timestamp
The Last-Modified time of the object. Defaults the current timestamp if this information is missing.
- View examples
Telemetry
This component provides the following metrics that can be retrieved through
the internal_metrics
source. See the
metrics section in the
monitoring page for more info.
- counter
sqs_message_delete_failed_total
The total number of failures to delete SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_delete_succeeded_total
The total number of successful deletions of SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_processing_failed_total
The total number of failures to process SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_processing_succeeded_total
The total number of SQS messages successfully processed. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_receive_failed_total
The total number of failures to receive SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_receive_succeeded_total
The total number of times successfully receiving SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_message_received_messages_total
The total number of received SQS messages. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_events_total
The total number of events processed by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.file
- The file that produced the errorinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
sqs_s3_event_record_ignored_total
The total number of times an S3 record in an SQS message was ignored (for an event that was not
ObjectCreated
). This metric includes the following tags:component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.ignore_type
- The reason for ignoring the S3 recordinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_bytes_total
The total number of bytes processed by the component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
How It Works
AWS Authentication
Vector checks for AWS credentials in the following order:
- Environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - The
credential_process
command in the AWS config file. (usually located at~/.aws/config
) - The AWS credentials file. (usually located at
~/.aws/credentials
) - The IAM instance profile. (will only work if running on an EC2 instance with an instance profile/role)
If credentials are not found the healtcheck will fail and an error will be logged.
Obtaining an access key
In general, we recommend using instance profiles/roles whenever possible. In cases where this is not possible you can generate an AWS access key for any user within your AWS account. AWS provides a detailed guide on how to do this.
Assuming roles
Vector can assume an AWS IAM role via the assume_role
option. This is an
optional setting that is helpful for a variety of use cases, such as cross
account access.
Context
By default, the aws_s3
source will augment events with helpful
context keys as shown in the "Output" section.
aws_s3
source
Handling events from the This source behaves very similarly to the file
source in that
it will output one event per line (unless the multiline
configuration option is used).
You will commonly want to use transforms to
parse the data. For example, to parse VPC flow logs sent to S3 you can
chain the tokenizer
transform:
[transforms.flow_logs]type = "tokenizer" # requiredinputs = ["s3"]field_names = ["version", "account_id", "interface_id", "srcaddr", "dstaddr", "srcport", "dstport", "protocol", "packets", "bytes", "start", "end", "action", "log_status"]types.srcport = "int"types.dstport = "int"types.packets = "int"types.bytes = "int"types.start = "timestamp|%s"types.end = "timestamp|%s"
To parse AWS load balancer logs, the regex_parser
transform can be used:
[transforms.elasticloadbalancing_fields_parsed]type = "regex_parser"inputs = ["s3"]regex = '(?x)^(?P<type>[\w]+)[ ](?P<timestamp>[\w:.-]+)[ ](?P<elb>[^\s]+)[ ](?P<client_host>[\d.:-]+)[ ](?P<target_host>[\d.:-]+)[ ](?P<request_processing_time>[\d.-]+)[ ](?P<target_processing_time>[\d.-]+)[ ](?P<response_processing_time>[\d.-]+)[ ](?P<elb_status_code>[\d-]+)[ ](?P<target_status_code>[\d-]+)[ ](?P<received_bytes>[\d-]+)[ ](?P<sent_bytes>[\d-]+)[ ]"(?P<request_method>[\w-]+)[ ](?P<request_url>[^\s]+)[ ](?P<request_protocol>[^"\s]+)"[ ]"(?P<user_agent>[^"]+)"[ ](?P<ssl_cipher>[^\s]+)[ ](?P<ssl_protocol>[^\s]+)[ ](?P<target_group_arn>[\w.:/-]+)[ ]"(?P<trace_id>[^\s"]+)"[ ]"(?P<domain_name>[^\s"]+)"[ ]"(?P<chosen_cert_arn>[\w:./-]+)"[ ](?P<matched_rule_priority>[\d-]+)[ ](?P<request_creation_time>[\w.:-]+)[ ]"(?P<actions_executed>[\w,-]+)"[ ]"(?P<redirect_url>[^"]+)"[ ]"(?P<error_reason>[^"]+)"'field = "message"drop_failed = falsetypes.received_bytes = "int"types.request_processing_time = "float"types.sent_bytes = "int"types.target_processing_time = "float"types.response_processing_time = "float"[transforms.elasticloadbalancing_url_parsed]type = "regex_parser"inputs = ["elasticloadbalancing_fields_parsed"]regex = '^(?P<url_scheme>[\w]+)://(?P<url_hostname>[^\s:/?#]+)(?::(?P<request_port>[\d-]+))?-?(?:/(?P<url_path>[^\s?#]*))?(?P<request_url_query>\?[^\s#]+)?'field = "request_url"drop_failed = false