Kafka Guidelines
Choosing Number of Partition
Each partition should be able to handle a few MB/s of throughput.
If small cluster i.e less than 6 brokers then 3 * number of brokers
If a big cluster i.e greater than 12 brokers, 2 * number of brokers.
One should decide the number of partitions based on number of consumer to be run in parallel in a given time at peak throughput.
Adjust for producer throughput.
Donot create 1000 partitions from the get go.
Replication Factor (RF)
Typically 2, usually 3 and maximum 4.
Higher RF say N, means better durability i.e., N-1 brokers can fail.
Better availability due to
min.insync.replicassettings, if produceracks=all.More replication also brings more latency if
acks=all.More disk space usage on your system.
Never set the RF=1 in production.
Cluster
Zookeeper
If Zookeper is being used to manage brokers in Kafka cluster the max number of partition is 200,000 partitions, due to zookeeper limit.
Still recommended to have a max of 4000 partitions per broker.
Kraft
Can have millions of partitions in your cluster.
Topic Naming convention
There is no naming convention enforced when giving topic name.
A good practice is to follow one convention and stick to it.
A good article here to get some good insights for this.
Example:
msg-type.dataset-name.data-name.data-format.Use snake case.
Last updated