Send logs from Splunk HEC to AWS S3

A simple guide to send logs from Splunk HEC to AWS S3 in just a few minutes.
type: tutorialdomain: sourcesdomain: sinkssource: splunk_hecsink: aws_s3

Logs are an essential part of observing any service; without them you'll have significant blind spots. But collecting and analyzing them can be a real challenge -- especially at scale. Not only do you need to solve the basic task of collecting your logs, but you must do it in a reliable, performant, and robust manner. Nothing is more frustrating than having your logs pipeline fall on it's face during an outage, or even worse, cause the outage!

Fear not! In this guide we'll build an observability pipeline that will send logs from Splunk HEC to AWS S3.

Background

What is Splunk HTTP Event Collector (HEC)?

The Splunk HTTP Event Collector (HEC) is a fast and efficient way to send data to Splunk Enterprise and Splunk Cloud. Notably, HEC enables you to send data over HTTP (or HTTPS) directly to Splunk Enterprise or Splunk Cloud from your application.

What is AWS S3?

Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-based cloud storage service designed for online backup and archiving of data and applications on Amazon Web Services. It is very commonly used to store log data.

Strategy

How This Guide Works

We'll be using Vector to accomplish this task. Vector is a popular open-source observability data platform. It's written in Rust, making it lightweight, ultra-fast and highly reliable. And we'll be deploying Vector as a agent.

Vector daemon deployment strategyVector daemon deployment strategy
1. Your service logs to STDOUT
STDOUT follows the 12 factor principles.
2. STDOUT is captured
STDOUT is captured and sent to a Splunk HEC client.
3. Vector collects & fans-out data
Vector will sends logs to [AWS S3](https://aws.amazon.com/s3/).

What We'll Accomplish

We'll build an observability data platform that:

  • Receives logs from Splunk HEC.
    • Enriches data with useful Splunk HEC context.
    • Securely receives data via Transport Layer Security (TLS).
  • Sends logs to AWS S3.
    • Buffers data in-memory or on-disk for performance and durability.
    • Compresses data to optimize bandwidth.
    • Automatically retries failed requests, with backoff.
    • Securely transmits data via Transport Layer Security (TLS).
    • Batches data to maximize throughput.

All in just a few minutes!

Tutorial

  1. Install Vector

    curl --proto '=https' --tlsv1.2 -sSf https://sh.vector.dev | sh
  2. Configure Vector

    cat <<-'VECTORCFG' > ./vector.toml
    [sources.splunk_hec]
    type = "splunk_hec"
    [sinks.out]
    type = "aws_s3"
    inputs = [ "splunk_hec" ]
    bucket = "my-bucket"
    region = "us-east-1"
    [sinks.out.encoding]
    codec = "ndjson"
    VECTORCFG
  3. Start Vector

    vector --config ./vector.toml
  4. Observe Vector

    vector top
    explain this command

Next Steps

Vector is powerful tool and we're just scratching the surface in this guide. Here are a few pages we recommend that demonstrate the power and flexibility of Vector:

Vector Github repo 4k
Vector is free and open-source!
Vector getting started series
Go from zero to production in under 10 minutes!
Vector documentation
Thoughtful, detailed docs.