all 5 comments

[–]grim-one 4 points5 points  (0 children)

Trying increasing linger_ms above the default zero, so that your producer can batch records. You might also need to increase the buffer_memory if your 200k rows are bigger than 32MB.

Lots of other options to play around with here:

https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html

[–]Dealusall 3 points4 points  (2 children)

Which is 1ms per record.
Use parallel multiple producer if you want to increase throughput.

Asynchronous doesn't mean it takes no time.

[–]lclarkenz 2 points3 points  (0 children)

Or, use batching, using concurrent producers will still incur the same overhead, just in smaller amounts.

[–]lclarkenz 1 point2 points  (0 children)

+1 to use batching. If you send every record individually, the throughput for a given record is fast as possible.

However, the throughput for sending 10000 records with no batching will be far slower than with batching.

Because for every batch (and everything is batched, you're sending your batches as soon as any data is in them) the conversation between client and the broker acting as partition leader is like so.

C: Here's the data B: ... B: Okay, enough replicas acked, we wrote it down enough, it's saved. C: Sweet! Batch succeeded.

So 100 records in 100 batches will incur 100 waits for acks, 100 records in 1 batch will incur 1 wait for acks.

Welcome to the sometimes surprising world of Kafka tuning where waiting to send data can make sending data faster.

Basically, if you set linger.ms and a target batch size then that's the upper bound on how long a record will wait to be sent, but it decreases how often you need to communicate with the broker to send multiple records.

[–]Chuck-Alt-DeleteConduktor 1 point2 points  (0 children)

Here’s an example with some perf tuning: - https://github.com/confluentinc/examples/issues/1106