Logs - At Topic Level

Cleanup Policies

  • Kafka expires data based on cleanup policies.

  • Allows to delete the obsolete data and control the size of the disk.

  • Log cleanups happen your partition segments.

  • Smaller/ more segments means that log clean up more frequently. It shouldnt happen too often.

  • log.cleaner.backoff.ms value configured for cleaner checks for every 15 seconds.

Delete

  • log.cleanup.policy=delete is default for all user topics.

  • Delete based on age of data, by default 1 week.

  • Delete can also be done based on max size of log (default is -1 => infinite).

  • log.retention.hours configures number of hours to keep the data.

    • Higher number means more disk space.

    • Lower number means less disk space, but consumer are down for too long, they can miss data.

  • log.retention.ms, log.retention.minutes are also there, smaller units have higher precedence.

  • log.retention.bytes is max size in bytes for each partition, default is -1.

  • Useful to keep the log size under a specific threshold.

  • So you can set retention by time or size.

Compact

  • log.cleanup.policy=compact is default for all internal topics.

  • Delete based on keys.

  • Will delete old duplicate keys after the active segment is committed.

  • Gives infinite time and space retention.

  • When you want only recent data and not care about old data this log retention policy is preferred.

    • i.e delete older data for a key and

  • Ordering of messages are kept and does not reorder.

  • Offsets are immutable and are skipped if message is missing.

  • Deleted records are visible to consumers for a period of delete.retention.ms setting value, default is 24 hours.

Last updated