Kafka Components

Topics

  • Topics are particular stream of data.

  • It is identified by topic name.

  • Can have as many topics as you want.

  • Supports any kind of message formats.

  • To draw an analogy topics are like table in database, but you cannot query it and does not have any constraints.

  • Can be used to send data using producers and receive data using consumers.

  • The sequence of messages is called a data stream.

  • Data written to topics (i.e., within partition) are immutable.

  • Data is kept only for one week by default i.e., configurable.

Partitions

  • Topics are split in to partitions.

  • Each partitions are made up of segments i.e files.

    • Each segments has range of offsets.

    • Last segment is called active segments.

    • For a partition there is only active segment at a time.f

  • Messages send to topics are going to be placed inside the partition and are ordered as per Offset.

  • Note that order of data is guranteed within a partition and not across partitions.

  • If data is not supplied with partition key, data is randomly assigned to a partition.

  • Can have as many partitions as you want.

Segments

  • There are two segment settings

    • log.segment.bytes : The max size of a single segment in bytes by default 1 GB.

      • If a segment grows above 1 GB, it closes and a new segment is created.

    • log.segment.ms : The time Kafka will wait before committing a segment if not full (max 1 week).

    • Segments comes with 2 indexes:

      • An offset to position index: Helps to find where to find a message.

      • A timestamp to offset index: Helps to find where to read a message with a specific timestamp.

    • A smaller segment size leads to more compaction

    • A smaller segment ms means more frequent triggers for log compaction.

Offset

  • Each message within a partition is given an incrementing ID which is called an Offset.

  • This means offset are related only to a partition.

  • Offsets are never reused, even after previous messages have been deleted.

Fundamental Terminology

Cluster

  • Kafka cluster is composed of multiple brokers (servers).

Brokers

  • Brokers are servers, they send and receive data.

  • Every broker in Kafka has broker Id (Integer).

  • Each broker contains topic and their partitions.

  • Connecting to a bootstrapping broker (any broker you connect to) will connect you to an entire cluster.

    • Kafka clients i.e producer and consumers have smart mechanics for this.

  • Cluster can have as many broker as you want, a good starting number is 3, some big cluster have upto 100 brokers.

  • Each broker has their respective data, no one broker has all data.

  • This makes Kafka scalable easily via horizontal scaling.

  • Given below is a Kafka cluster with 3 brokers.

    • Topics

      • Topic A with 3 partition

      • Topic B with 2 partition

Kafka Brokers

Broker Discovery

  • Every broker in Kafka is also called a bootstrap server.

  • So when establishing connect with Kafka cluser, we only need to connect to one cluster and Kafka client (smart clients) will discover how to connect to entire cluster.

  • Each broker knows about all brokers, topics and partitions via metadata.

Kafka Broker Discovery

Kafka Message Anatomy

Message anatomy

Last updated