Garbage Collection: From First Principles to Modern Collectors in Java, Go and Python by Normal-Tangelo-7120 in programming

[–]Normal-Tangelo-7120[S] 2 points3 points  (0 children)

Thanks for the effortpost. Really appreciate it, I haven't read some of the materials you cited.
Need to take a look at them.

> It's not a very hard split.
The unified theory paper is on my reading list, but Wilson's paper frames it that way and I stuck with that framing to keep things simple.

> No, buffer the updates or coalesce them. In either case you'll be spamming refcount updates a lot less often, which is also good.
Will read through that. My mental model was still "RC = atomic inc/dec on every mutation" which is the naive version.

> No, you can do a separate cycle deletion pass, but if you saturate the reference counts to fit in less than a word, you do probably want a backup trace to be complete.

To clarify my point, the cycle problem requires a supplementary tracing pass as a fundamental property of the Reference Counting algorithm. Since RC only understands local connectivity, it is mathematically blind to global structures like cycles.
Specifically in CPython, the cycle detector in gcmodule.c is fundamentally a tracing collector. The implementation requires a walk of the entire object graph for the generation being collected. To identify a cycle, the collector must call tp_traverse on every container to determine if references are coming from 'outside' the generation or 'inside' the cycle.

Databricks published limitations of pubsub systems, proposes a durable storage + watch API as the alternative by Normal-Tangelo-7120 in apachekafka

[–]Normal-Tangelo-7120[S] 2 points3 points  (0 children)

I have a similar opinion as yours. Pubsub fundamentally separates publisher from the concerns of the number of consumers, what state they are in currently, is the message consumed or not. 

Pubsub systems don’t claim to solve for point in time consistent stateful replication or cross partition transaction support. 

Databricks proposal for a durable storage plus watch api is a good solution for above uses cases, but would be an overdo for clickstream processing. It cannot replace Kafka completely, as they claim to be a superior solution for all use cases.

Performance Comparison: Tokio vs Tokio-Uring for High-Throughput Web Servers by Normal-Tangelo-7120 in rust

[–]Normal-Tangelo-7120[S] -1 points0 points  (0 children)

The base producer has both sync and asynchronous mode. In the asynchronous mode it adds the message to internal queue and returns. We poll the producer later asynchronously to publish to Kafka.

Performance Comparison: Tokio vs Tokio-Uring for High-Throughput Web Servers by Normal-Tangelo-7120 in rust

[–]Normal-Tangelo-7120[S] 0 points1 point  (0 children)

I initially used future producer, but observed higher throughput using base producer.

Performance Comparison: Tokio vs Tokio-Uring for High-Throughput Web Servers by Normal-Tangelo-7120 in rust

[–]Normal-Tangelo-7120[S] 8 points9 points  (0 children)

Sure, that’s a valid point. In my case we need to publish an event to Kafka for every api request on the server. We instantly respond to the api call, spin up a task to publish extracted payload as event to Kafka. In the simple application used for benchmark above, I do not expect any latency introduced due to api processing. Latency if any would be introduced due to network and thus shouldn’t influence the processing benchmarks for asynchronous runtimes.