Let’s say you’re working on a program to load a Kafka topic and you mess up and want to start over. There are two good ways of doing this. Both of these methods involve connecting to the name node and running shell scripts in /usr/hdp/[version]/kafka/bin (for the Hortonworks Data Platform; for some other distro, I leave it as an exercise to the reader to find the appropriate directly…mostly because I wouldn’t know where it was).
Method One: Delete And Re-Create
The method that I’ve shown already is the delete and re-create method. This one is pretty simple: we delete the existing topic and then generate a new one with the same name.
./kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
When you delete the topic, you’ll the the following warning message:
You can check this in Ambari by going to the Kafka —> Configs section:
Then, once we’ve deleted the topic, we can re-create it.
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Method Two: Retention Policy Shenanigans
The first method works fine for non-production scenarios where you can stop all of the producers and consumers, but let’s say that you want to flush the topic while leaving your producers and consumers up (but maybe you have a downtime window where you know the producers aren’t pushing anything). In this case, we can change the retention period to something very short, let the queue flush, and bring it back to normal, all using the kafka-configs shell script.
First, let’s check out our current configuration settings for the topic called test:
./kafka-configs.sh --zookeeper localhost:2181 --describe --entity-type topics --entity-name test
This might look odd at first, but it’s just the Kafka configuration script’s way of saying that you’re using the default settings. Incidentally, our default setting has a retention period of 168 hours, as we can see in Ambari.
Now that we have the correct script, we can run the following command to set our retention policy to something a bit shorter:
./kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name test --add-config retention.ms=1000
Now we can see that the retention period is 1000 milliseconds, or one second. Give that a minute or two to take hold and then we can run the following to remove the special configuration:
./kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name test --delete-config retention.ms
And we’re back, with no real downtime. As long as the producers were temporarily paused, we didn’t lose any data and our producers can go about their business like nothing happened.
There are at least two different methods for clearing out a Kafka topic. Before you break out the hammer, see if monkeying with the retention period will solve your problem without as much disruption.