Logs - At Topic Level
Cleanup Policies
Kafka expires data based on cleanup policies.
Allows to delete the obsolete data and control the size of the disk.
Log cleanups happen your partition segments.
Smaller/ more segments means that log clean up more frequently. It shouldnt happen too often.
log.cleaner.backoff.msvalue configured for cleaner checks for every 15 seconds.
Delete
log.cleanup.policy=deleteis default for all user topics.Delete based on age of data, by default 1 week.
Delete can also be done based on max size of log (default is -1 => infinite).
log.retention.hoursconfigures number of hours to keep the data.Higher number means more disk space.
Lower number means less disk space, but consumer are down for too long, they can miss data.
log.retention.ms,log.retention.minutesare also there, smaller units have higher precedence.log.retention.bytesis max size in bytes for each partition, default is -1.Useful to keep the log size under a specific threshold.
So you can set retention by time or size.
Compact
log.cleanup.policy=compactis default for all internal topics.Delete based on keys.
Will delete old duplicate keys after the active segment is committed.
Gives infinite time and space retention.
When you want only recent data and not care about old data this log retention policy is preferred.
i.e delete older data for a key and
Ordering of messages are kept and does not reorder.
Offsets are immutable and are skipped if message is missing.
Deleted records are visible to consumers for a period of
delete.retention.mssetting value, default is 24 hours.
Last updated