Latency Versus Throughput

While working on my Kafka series, I was looking at ways of improving performance when using the RdKafka .NET provider. One consistent message throughout this is that the default configuration leads to relatively high latency. This has been true for a while and there’s even a question in the FAQ about this. This is certainly not something limited to RdKafka, and I want to think about it in a general sense rather than specific to RdKafka.

The Trade-Off

The primary method in which data moves from one process to another is through buffers. We break up data into smaller portions and push them to their destination. In Integration Services, we have buffers. When passing data through TCP, we use packets.

Okay, so what’s the trade-off? The trade-off is between latency and throughput. Let’s take TCP packets as an example. Say you have a series of 50-byte messages you want to send from a source to a destination. We have two primary options: push messages as fast as possible, or hold off until you have the most data you can store in a packet and send it along. For simplicity’s sake, we’ll say that we can fit about 1350 bytes in a packet, so we can store 27 messages in a packet. We’ll also assume that it takes 10 milliseconds to send a packet from the source to the destination (regardless of packet size, as we’re using powerful connections) and 1 millisecond to produce a message.

Let’s look a little deeper at what this trade-off entails.

What Do I Care About?

There are two reasonable options here for what we want to optimize: getting messages to the destination fastest and pushing data the fastest (by which I mean moving the greatest number of bytes in the smallest amount of time).

Minimizing Latency

When we talk about getting individual messages to their destination the fastest, what we’re saying is that we want to minimize latency. In this case, I care more about making sure that a message gets to the destination as quickly as possible.

Let’s look at what it takes to send 10,000 messages. We’ll assume that the source can keep sending packets out without getting TCP delays (to hold off on sending messages) or dropped packets.

Knowing that it takes 1 millisecond to produce a message and 10 milliseconds to push the message to the destination, what we see is a 10-millisecond spin-up period, after which point the destination receives one message each millisecond for the next 10 seconds.

Here’s a quick summary of stats:

10,000 packets received in 10,010 milliseconds
First message reaches destination 11 milliseconds in
Average throughput: 49,945 bytes/sec

The formula we can use is (N * L ms) + W, where N is the number of packets, L is the message generation latency (1ms), and W is the network latency.

Fill Some Packets

Our other alternative is to fill packets to the brim before pushing them out. We need to build 27 messages before we push a packet to the destination, so it takes 27ms to build a packet and another 10ms to get it to the destination. In those 10ms, we’ll have the next packet 10/27 of the way full, so the next packet goes out will go out at 54ms and arrive at 64ms.

We will need to build 371 total packets total, in which 370 will be full and one will be partially full. But we’ll assume that the sender knows when the stream ends and can send out the last packet at the same time.

10,000 packets received in 10,010 milliseconds
First message received 37 milliseconds in
Average throughput: 49,945 bytes/sec

The formula is the same as above, and so are the results except for the first packet latency.

What If We Change Message Speed?

If we make generation of messages faster or slower, we don’t change anything. For example, let’s make it so that we can generate a message every microsecond, or 1000 per millisecond. Now, the message-per-packet approach fires off one message every microsecond and each takes 10 milliseconds to arrive. In a steady state, we’ll have 10,000 packets traveling over the wire at any particular moment. In the meantime, filling a packet takes 27 microseconds, so in a steady state, we’ll have 370 or 371 packets traveling over the wire at any moment. That means that both will take (10,000 * (1/1000) ms) + 10 ms = 20 ms or 20,000 microseconds.

When It Benefits To Wait

So far, we’ve seen that the two approaches are effectively equivalent. So why would we ever wait for a packet to fill before sending it out? The answer is that most packet-based systems have a maximum queue length and the destination can tell the source to back off.

So now, let’s say that the maximum queue length is 500 messages, meaning that we can have up to 500 messages in flight at a time. This has no effect on our packed packets, but it certainly does affect our packet-per-message scenario. This means that the first 500 messages can go out in the first 500 microseconds, but now we have to wait until 10 ms and 1 microsecond until message 501 goes out. This limits us to 500 messages every 10 milliseconds, meaning that we won’t get to message 10,000 until 200,000 microseconds in. This is 10x slower than the packet packet scenario!

The Real Answer: It Depends

In practice, the difference probably won’t be this extreme, but I think it serves us in understanding why we might want to wait for packets to fill, and when we are sending a significant amount of data, we typically want to maximize the data in a packet before sending it out, as network latency tends to be significantly higher than message generation latency.

But there are three scenarios in which we might not want to wait to see if packets get filled:

Latency is more important than throughput. If we need to minimize latency—even if it’s going to take us longer—then we want to send messages as soon as we receive them rather than waiting for a bunch of messages.
Message generation is infrequent. If we only generate 1 50-byte message per second, it’s a better experience to have each packet go out individually as we’ll get a steady stream of packets (first at 10ms, one every second thereafter). Even though there’s no difference in time, it feels better to the end user.
Queue length is not relevant. This is the general form of #2. If we never saturate the queue (either due to message generation frequency or relative message generation time versus network latency time), then minimizing latency is a better experience. Otherwise, we get better throughput by waiting for packets to fill.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

36 Chambers – The Legendary Journeys: Execution to the max!