Today’s Kafka post will be a relatively simple one, where we use the built-in shell scripts to create a new topic, add some records, and view those records. We’ll wrap it up by creating the topics I need for the rest of the process.
I’m going to use the Hortonworks Data Platform 2.4 sandbox for this. You can use other versions, but I have this one readily available (thanks to my Polybase + Docker issues).
Is Kafka Actually On?
By default, Kafka is in maintenance mode on the Hortonworks sandbox. I modified my local hosts file to make sandbox.hortonworks.com point to my sandbox, so to connect to Ambari on port 8080, I go to http://sandbox.hortonworks.com:8080:
Now that we know Kafka is on, I can check the Configs tab to see how to connect to Kafka:
There are three important things here: first, our Zookeeper port is 2181. Zookeeper is great for centralized configuration and coordination; if you want to learn more, check out this Sean Mackrory post.
The second bit of important information is how long our retention period is. Right now, it’s set to 7 days, and that’s our default. Remember that messages in a Kafka topic don’t go away simply because some consumer somewhere accessed them; they stay in the log until we say they can go.
Finally, we have a set of listeners. For the sandbox, the only listener is on port 6667. We connect to listeners from our outside applications, so knowing those addresses and ports is vital.
Creating A Topic
Now that we know where everything is, let’s connect to our sandbox and start using some Kafka shell scripts. There are three scripts that I’ll use today:
The first script I plan to use is kafka-topics.sh.
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
The response is pretty simple: Created topic “test”
If we want to know what topics already exist, I can use the list command:
./kafka-topics.sh --zookeeper localhost:2181 --list
You can see my “test” topic as well as a few others.
Publish To A Topic
We want to add a message to a topic, and the quickest way to do that is to use the built-in console producer:
./kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic test
If you did it right, you’ll get a blinking cursor and the ability to enter text. You publish a message by hitting the enter key, and you can quit with control + c.
Consume From A Topic
We have something to send messages, so it make sense that we have something to receive messages. That’s what the console consumer does:
./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This shell script reads from a topic and there’s an optional from-beginning flag which lets you start from the very first message. Otherwise, if you leave that off, the consumer will pick up the topic in mid-stream and show whatever anybody is currently pushing, but not go back into history.
It’s hard to capture in screenshots, but if you have the publisher and consumer both up at the same time, you’ll see messages appear on the consumer almost instantaneously.
Cleanup: Remove A Topic
When we don’t have any further use for a topic, we can remove it:
./kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
The message we get back is interesting:
The delete.topic.enable configuration setting is false by default, so we have to flip it to true before we can actually delete a topic. It’s easy enough to do in Ambari:
Once you change that configuration setting, you will need to restart the service.
Final Prep Work: Creating Relevant Topics
Okay, we played around with a test topic a little bit, so let’s create the real topics that we’ll use for the rest of this journey. I want to create two topics: one for raw flight data and the second for enriched flight data. The commands are straightforward:
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Flights ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic EnrichedFlights
You’ll note that we already had these topics, almost like I’ve actually done the work first and am not winging it…
Today’s post was about understanding the basics of Kafka—the very basics. There are a lot of other things to learn, but I’m going to hold off on those, as we’ve got the bare minimum of what we need to get going. We’ll start fresh tomorrow on the .NET side of the house.