I have a Kafka topic with delete,compact cleanup policy. I don't want to keep records older than 30 days, so retention.ms
is set to 30 days. Other relevant configurations are default values, so:
cleanup.policy=delete,compact
retention.ms=2592000000 (30 days)
segment.bytes=1073741824 (1GB)
segment.ms=604800000 (1 week)
In the topic I still have records for the past 6 months. When examining the log files I see the following:
00000000000000000000.log 581865395 Feb 7 15:45
00000000000003478578.log 37929134 Feb 14 15:45
00000000000003669403.log 276311746 Feb 21 15:45
00000000000003836847.log 336161019 Feb 28 15:46
00000000000004021954.log 288053840 Mar 6 15:51
It seems new file is created weekly, as expected. New records are written to the latest log file. The problem is that currently file 00000000000000000000.log
contains records between 2024-09-25 and 2025-02-07, so log retention won't delete it, because not all records in it are at least 30 days old. Records in other log files are as expected, they contain only records for the last 7 days before the last modification time of the file.
It looks like for some reason records are sometimes added to the 00000000000000000000.log
file and this prevents retaining this log file. What could be the explanation for it? After examining the Kafka docs my understanding is that new records can be added only to the active log file. Older log files shouldn't grow. They could be deleted by retention or compactor could create a copy of them omitting compacted records. But no new records should appear in an older log file.
For my other topics, where cleanup policy is just simply delete
everything works as expected. This strange behaviour occours on topics with delete,compact
policy.
I have a Kafka topic with delete,compact cleanup policy. I don't want to keep records older than 30 days, so retention.ms
is set to 30 days. Other relevant configurations are default values, so:
cleanup.policy=delete,compact
retention.ms=2592000000 (30 days)
segment.bytes=1073741824 (1GB)
segment.ms=604800000 (1 week)
In the topic I still have records for the past 6 months. When examining the log files I see the following:
00000000000000000000.log 581865395 Feb 7 15:45
00000000000003478578.log 37929134 Feb 14 15:45
00000000000003669403.log 276311746 Feb 21 15:45
00000000000003836847.log 336161019 Feb 28 15:46
00000000000004021954.log 288053840 Mar 6 15:51
It seems new file is created weekly, as expected. New records are written to the latest log file. The problem is that currently file 00000000000000000000.log
contains records between 2024-09-25 and 2025-02-07, so log retention won't delete it, because not all records in it are at least 30 days old. Records in other log files are as expected, they contain only records for the last 7 days before the last modification time of the file.
It looks like for some reason records are sometimes added to the 00000000000000000000.log
file and this prevents retaining this log file. What could be the explanation for it? After examining the Kafka docs my understanding is that new records can be added only to the active log file. Older log files shouldn't grow. They could be deleted by retention or compactor could create a copy of them omitting compacted records. But no new records should appear in an older log file.
For my other topics, where cleanup policy is just simply delete
everything works as expected. This strange behaviour occours on topics with delete,compact
policy.
1 Answer
Reset to default 1It turned out that compacting works differently than it is written in the documentation. After examining source code of kafka.log.LogCleaner I understood what was happening in my case.
Cleaning (compacting) is deleting some records from the log files. Kafka wants to prevent very small log files, so during cleaning it merges log files up to the size configured in segment.bytes.
In my case a new log file was created weekly, but the compacted weekly data was around 30-40 MB and segment.bytes
set to the default 1 GB. So during compaction Kafka always merged the compacted segments into one file, because the merged result was still under 1 GB.
So for the expected result I had to choose correct value for segment.bytes
. It should be smaller than the size of the compacted data for the desired retention interval, so period in one segment file will not be larger than the retention interval. With such setup compact does not merge data from a wide period to one file, and the segment file will be eligible for retention when needed.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744949874a4602853.html
评论列表(0条)