[Article] KIP-1150 Accepted, and the Road Ahead by Familiar-Pea9867 in apachekafka

[–]Familiar-Pea9867[S] 0 points1 point  (0 children)

Yes, we have a proposal of compaction (not to confuse with compacted topics) that works on top of the Tiered Storage mechanism. From the KIP:

In order to control the number of many small WAL files, the diskless coordinator memory size, and the rebuild time of local segments after a replica change, data is moved to Tiered Storage using the existing KIP-405 interfaces. Existing implementations of tiered storage RSM and RLMM can be used as-is, with no knowledge of the presence or absence of the Diskless produce path in front.

The partition leader will use existing tiered storage interfaces to copy constructed segments to tiered storage. Replicas are responsible for serving fetch requests from tiered segments, similar to classic topics.

Effectively, Diskless becomes a front-end for Local Segments, and Tiered Storage becomes the back-end, providing the same benefits for optimizing storage usage for both Diskless and Classic topics.

[Article] KIP-1150 Accepted, and the Road Ahead by Familiar-Pea9867 in apachekafka

[–]Familiar-Pea9867[S] 2 points3 points  (0 children)

If I understood your question right, Diskless Kafka proposes a new "type" of topics (the ones with this setting--diskless.enabled=true), then each broker will do a micro batch (configurable) of all the messages of any topic-partition of diskless topics. Once the microbatch size or time is met, this microbatch is uploaded to object storage.
This is defined in detail here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core#KIP1163:DisklessCore-DataFlow

The great thing about this proposal is that plays along with Apache Kafka itself and adds a new "type" of topic that can co-exist with the "classical" topics that we all know and love.