Kafka Components
Topics
Topics are particular stream of data.
It is identified by topic name.
Can have as many topics as you want.
Supports any kind of message formats.
To draw an analogy topics are like table in database, but you cannot query it and does not have any constraints.
Can be used to send data using producers and receive data using consumers.
The sequence of messages is called a data stream.
Data written to topics (i.e., within partition) are immutable.
Data is kept only for one week by default i.e., configurable.
Partitions
Topics are split in to
partitions.Each partitions are made up of segments i.e files.
Each segments has range of offsets.
Last segment is called active segments.
For a partition there is only active segment at a time.f
Messages send to topics are going to be placed inside the partition and are ordered as per
Offset.Note that order of data is guranteed within a partition and not across partitions.
If data is not supplied with partition key, data is randomly assigned to a partition.
Can have as many partitions as you want.
Segments
There are two segment settings
log.segment.bytes: The max size of a single segment in bytes by default 1 GB.If a segment grows above 1 GB, it closes and a new segment is created.
log.segment.ms: The time Kafka will wait before committing a segment if not full (max 1 week).Segments comes with 2 indexes:
An offset to position index: Helps to find where to find a message.
A timestamp to offset index: Helps to find where to read a message with a specific timestamp.
A smaller segment size leads to more compaction
A smaller segment ms means more frequent triggers for log compaction.
Offset
Each message within a partition is given an incrementing ID which is called an
Offset.This means offset are related only to a partition.
Offsets are never reused, even after previous messages have been deleted.

Cluster
Kafka cluster is composed of multiple brokers (servers).
Brokers
Brokers are servers, they send and receive data.
Every broker in Kafka has broker Id (Integer).
Each broker contains topic and their partitions.
Connecting to a bootstrapping broker (any broker you connect to) will connect you to an entire cluster.
Kafka clients i.e producer and consumers have smart mechanics for this.
Cluster can have as many broker as you want, a good starting number is 3, some big cluster have upto 100 brokers.
Each broker has their respective data, no one broker has all data.
This makes Kafka scalable easily via horizontal scaling.
Given below is a Kafka cluster with 3 brokers.
Topics
Topic A with 3 partition
Topic B with 2 partition

Broker Discovery
Every broker in Kafka is also called a
bootstrapserver.So when establishing connect with Kafka cluser, we only need to connect to one cluster and Kafka client (smart clients) will discover how to connect to entire cluster.
Each broker knows about all brokers, topics and partitions via metadata.

Kafka Message Anatomy

Last updated