When working with Apache Kafka, there may be a situation when we need to delete data from topick, because e.g. during testing junk data was sent, and we have not yet implemented support for such errors, resulting in the so-called “poison pill” – that is, a record (s) that each time we try to consume from Kafka cause that our processing fails.
1 method (not recommended)
We can simply delete the topic and create it again. Personally, I think it is better to use the second method, i.e.
2 method: retention change
The second way is to change the data retention on the topick to some low value, e.g. 1 second. The data will be automatically deleted by Kafka’s internal processes. We don’t have to worry about anything.
First, let’s check the current configuration of the topick: retention.ms=86400000 (7 days)
kafka-topics --zookeeper kafka:2181 --topic bigdata-etl-file-source -describe
Topic:bigdata-etl-file-source PartitionCount:1 ReplicationFactor:1 Configs:retention.ms=86400000 Topic: bigdata-etl-file-source Partition: 0 Leader: 0 Replicas: 0 Isr: 0
We change the retention for 1 second
kafka-configs --zookeeper <zookeeper>:2181 --entity-type topics --alter --entity-name bigdata-etl-file-source --add-config retention.ms=1000
Sprawdzamy konfigurację:
kafka-configs --zookeeper <zookeeper>:2181 --entity-type topics --alter --entity-name bigdata-etl-file-source --add-config retention.ms=1000 kafka-configs --zookeeper kafka:2181 --entity-type topics --alter --entity-name bigdata-etl-file-source --add-config retention.ms=1000
Remember to wait for a while (about 1 minute) for the data to be deleted.
After we verify that the data has already been removed from the topic, we can restore the previous settings.
If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!