Send logs from Kafka to Clickhouse

A simple guide to send logs from Kafka to Clickhouse in just a few minutes.
type: tutorialdomain: sourcesdomain: sinkssource: kafkasink: clickhouse

Logs are an essential part of observing any service; without them you'll have significant blind spots. But collecting and analyzing them can be a real challenge -- especially at scale. Not only do you need to solve the basic task of collecting your logs, but you must do it in a reliable, performant, and robust manner. Nothing is more frustrating than having your logs pipeline fall on it's face during an outage, or even worse, cause the outage!

Fear not! In this guide we'll build an observability pipeline that will send logs from Kafka to Clickhouse.


What is Kafka?

Apache Kafka is an open-source project for a distributed publish-subscribe messaging system rethought as a distributed commit log. Kafka stores messages in topics that are partitioned and replicated across multiple brokers in a cluster. Producers send messages to topics from which consumers read. These features make it an excellent candidate for durably storing logs and metrics data.

What is Clickhouse?

ClickHouse is an open-source column-oriented database management system that manages extremely large volumes of data, including non-aggregated data, in a stable and sustainable manner and allows generating custom data reports in real time. The system is linearly scalable and can be scaled up to store and process trillions of rows and petabytes of data. This makes it an best-in-class storage for logs and metrics data.


How This Guide Works

We'll be using [Vector][urls.vector_website] to accomplish this task. Vector is a popular open-source observability data pipeline. It's written in Rust, making it lightweight, ultra-fast and highly reliable. And we'll be deploying Vector as a agent.

Vector daemon deployment strategyVector daemon deployment strategy
1. Your service logs to STDOUT
STDOUT follows the 12 factor principles.
2. STDOUT is captured
STDOUT is captured and sent to Kafka topics.
3. Vector collects & fans-out data
Vector will sends logs to [Clickhouse](

What We'll Accomplish

We'll build an observability data pipeline that:

  • Collects logs from Kafka.
    • Enriches data with useful Kafka context.
    • Efficiently collects data and checkpoints read positions to ensure data is not lost between restarts.
    • Securely collects data via Transport Layer Security (TLS).
  • Sends logs to Clickhouse.
    • Buffers data in-memory or on-disk for performance and durability.
    • Compresses data to optimize bandwidth.
    • Automatically retries failed requests, with backoff.
    • Securely transmits data via Transport Layer Security (TLS).
    • Batches data to maximize throughput.

All in just a few minutes!


  1. Install Vector

    curl --proto '=https' --tlsv1.2 -sSf | sh
  2. Configure Vector

    cat <<-'VECTORCFG' > ./vector.toml
    type = "kafka"
    bootstrap_servers = ","
    group_id = "consumer-group-name"
    topics = [ "^(prefix1|prefix2)-.+", "topic-1", "topic-2" ]
    type = "clickhouse"
    inputs = [ "kafka" ]
    endpoint = "http://localhost:8123"
    table = "mytable"
  3. Start Vector

    vector --config ./vector.toml
  4. Observe Vector

    vector top
    explain this command

Next Steps

Vector is powerful tool and we're just scratching the surface in this guide. Here are a few pages we recommend that demonstrate the power and flexibility of Vector:

Vector Github repo 4k
Vector is free and open-source!
Vector quickstart
Get setup in just a few minutes
Vector documentation
Everything you need to know about Vector