Records are added to old Kafka log segment file using delete,compact cleanup policy - Stack Overflow

I have a Kafka topic with delete,compact cleanup policy. I don't want to keep records older than 3

I have a Kafka topic with delete,compact cleanup policy. I don't want to keep records older than 30 days, so retention.ms is set to 30 days. Other relevant configurations are default values, so:

cleanup.policy=delete,compact
retention.ms=2592000000 (30 days)
segment.bytes=1073741824 (1GB)
segment.ms=604800000 (1 week)

In the topic I still have records for the past 6 months. When examining the log files I see the following:

00000000000000000000.log 581865395 Feb  7 15:45
00000000000003478578.log  37929134 Feb 14 15:45
00000000000003669403.log 276311746 Feb 21 15:45
00000000000003836847.log 336161019 Feb 28 15:46
00000000000004021954.log 288053840 Mar  6 15:51

It seems new file is created weekly, as expected. New records are written to the latest log file. The problem is that currently file 00000000000000000000.log contains records between 2024-09-25 and 2025-02-07, so log retention won't delete it, because not all records in it are at least 30 days old. Records in other log files are as expected, they contain only records for the last 7 days before the last modification time of the file.

It looks like for some reason records are sometimes added to the 00000000000000000000.log file and this prevents retaining this log file. What could be the explanation for it? After examining the Kafka docs my understanding is that new records can be added only to the active log file. Older log files shouldn't grow. They could be deleted by retention or compactor could create a copy of them omitting compacted records. But no new records should appear in an older log file.

For my other topics, where cleanup policy is just simply delete everything works as expected. This strange behaviour occours on topics with delete,compact policy.

I have a Kafka topic with delete,compact cleanup policy. I don't want to keep records older than 30 days, so retention.ms is set to 30 days. Other relevant configurations are default values, so:

cleanup.policy=delete,compact
retention.ms=2592000000 (30 days)
segment.bytes=1073741824 (1GB)
segment.ms=604800000 (1 week)

In the topic I still have records for the past 6 months. When examining the log files I see the following:

00000000000000000000.log 581865395 Feb  7 15:45
00000000000003478578.log  37929134 Feb 14 15:45
00000000000003669403.log 276311746 Feb 21 15:45
00000000000003836847.log 336161019 Feb 28 15:46
00000000000004021954.log 288053840 Mar  6 15:51

It seems new file is created weekly, as expected. New records are written to the latest log file. The problem is that currently file 00000000000000000000.log contains records between 2024-09-25 and 2025-02-07, so log retention won't delete it, because not all records in it are at least 30 days old. Records in other log files are as expected, they contain only records for the last 7 days before the last modification time of the file.

It looks like for some reason records are sometimes added to the 00000000000000000000.log file and this prevents retaining this log file. What could be the explanation for it? After examining the Kafka docs my understanding is that new records can be added only to the active log file. Older log files shouldn't grow. They could be deleted by retention or compactor could create a copy of them omitting compacted records. But no new records should appear in an older log file.

For my other topics, where cleanup policy is just simply delete everything works as expected. This strange behaviour occours on topics with delete,compact policy.

Share asked Mar 6 at 20:52 oriskoporiskop 1631 gold badge1 silver badge7 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

It turned out that compacting works differently than it is written in the documentation. After examining source code of kafka.log.LogCleaner I understood what was happening in my case.

Cleaning (compacting) is deleting some records from the log files. Kafka wants to prevent very small log files, so during cleaning it merges log files up to the size configured in segment.bytes. In my case a new log file was created weekly, but the compacted weekly data was around 30-40 MB and segment.bytes set to the default 1 GB. So during compaction Kafka always merged the compacted segments into one file, because the merged result was still under 1 GB.

So for the expected result I had to choose correct value for segment.bytes. It should be smaller than the size of the compacted data for the desired retention interval, so period in one segment file will not be larger than the retention interval. With such setup compact does not merge data from a wide period to one file, and the segment file will be eligible for retention when needed.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744949874a4602853.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信