Apache Kafka: How to delete data from Kafka topic?

When working with Apache Kafka, there may be a situation when we need to delete data from topick, because e.g. during testing junk data was sent, and we have not yet implemented support for such errors, resulting in the so-called “poison pill” – that is, a record (s) that each time we try to consume from Kafka cause that our processing fails.

1 method (not recommended)

We can simply delete the topic and create it again. Personally, I think it is better to use the second method, i.e.

2 method: retention change

The second way is to change the data retention on the topick to some low value, e.g. 1 second. The data will be automatically deleted by Kafka’s internal processes. We don’t have to worry about anything.

First, let’s check the current configuration of the topick: retention.ms=86400000 (7 days)

kafka-topics --zookeeper kafka:2181 --topic bigdata-etl-file-source -describe

Topic:bigdata-etl-file-source	PartitionCount:1	ReplicationFactor:1	Configs:retention.ms=86400000
	Topic: bigdata-etl-file-source	Partition: 0	Leader: 0	Replicas: 0	Isr: 0

We change the retention for 1 second

kafka-configs --zookeeper <zookeeper>:2181 --entity-type topics --alter --entity-name bigdata-etl-file-source --add-config retention.ms=1000

Sprawdzamy konfigurację:

kafka-configs --zookeeper <zookeeper>:2181 --entity-type topics --alter --entity-name bigdata-etl-file-source --add-config retention.ms=1000

kafka-configs --zookeeper kafka:2181 --entity-type topics --alter --entity-name bigdata-etl-file-source --add-config retention.ms=1000

Remember to wait for a while (about 1 minute) for the data to be deleted.

After we verify that the data has already been removed from the topic, we can restore the previous settings.

If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments