Kafka Guidelines

Choosing Number of Partition

  • Each partition should be able to handle a few MB/s of throughput.

  • If small cluster i.e less than 6 brokers then 3 * number of brokers

  • If a big cluster i.e greater than 12 brokers, 2 * number of brokers.

  • One should decide the number of partitions based on number of consumer to be run in parallel in a given time at peak throughput.

  • Adjust for producer throughput.

  • Donot create 1000 partitions from the get go.

Replication Factor (RF)

  • Typically 2, usually 3 and maximum 4.

  • Higher RF say N, means better durability i.e., N-1 brokers can fail.

  • Better availability due to min.insync.replicas settings, if producer acks=all.

  • More replication also brings more latency if acks=all.

  • More disk space usage on your system.

  • Never set the RF=1 in production.

Cluster

  • Zookeeper

    • If Zookeper is being used to manage brokers in Kafka cluster the max number of partition is 200,000 partitions, due to zookeeper limit.

    • Still recommended to have a max of 4000 partitions per broker.

  • Kraft

    • Can have millions of partitions in your cluster.

Topic Naming convention

  • There is no naming convention enforced when giving topic name.

  • A good practice is to follow one convention and stick to it.

  • A good article herearrow-up-right to get some good insights for this.

  • Example: msg-type.dataset-name.data-name.data-format.

  • Use snake case.

Last updated