Logs are an essential part of observing any service; without them you'll have significant blind spots. But collecting and analyzing them can be a real challenge -- especially at scale. Not only do you need to solve the basic task of collecting your logs, but you must do it in a reliable, performant, and robust manner. Nothing is more frustrating than having your logs pipeline fall on it's face during an outage, or even worse, cause the outage!
Fear not! In this guide we'll build an observability pipeline that will send logs from AWS S3 to Apache Pulsar.
Background
What is AWS S3?
Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-based cloud storage service designed for online backup and archiving of data and applications on Amazon Web Services. It is very commonly used to store log data.
What is Apache Pulsar?
Pulsar is a multi-tenant, high-performance solution for server-to-server messaging. Pulsar was originally developed by Yahoo, it is under the stewardship of the Apache Software Foundation. It is an excellent tool for streaming logs and metrics data.
Strategy
How This Guide Works
We'll be using Vector to accomplish this task. Vector is a popular open-source observability data platform. It's written in Rust, making it lightweight, ultra-fast and highly reliable. And we'll be deploying Vector as a agent.
What We'll Accomplish
We'll build an observability data platform that:
All in just a few minutes!
Tutorial
Install Vector
curl --proto '=https' --tlsv1.2 -sSf https://sh.vector.dev | shConfigure Vector
cat <<-'VECTORCFG' > ./vector.toml[sources.aws_s3]type = "aws_s3"region = "us-east-1"[sinks.out]type = "pulsar"inputs = [ "aws_s3" ]endpoint = "pulsar://127.0.0.1:6650"topic = "topic-1234"[sinks.out.encoding]codec = "text"VECTORCFGStart Vector
vector --config ./vector.tomlObserve Vector
vector topexplain this command
Next Steps
Vector is powerful tool and we're just scratching the surface in this guide. Here are a few pages we recommend that demonstrate the power and flexibility of Vector: