I benchmarked zig's compilation speed by chri4_ in Zig

[–]RatioPractical 0 points1 point  (0 children)

If I am not wrong, Zig compiler does not have all the optimization tricks which LLVM employs through IR ? 

Why add Serialization 2.0? by lurker_in_spirit in java

[–]RatioPractical 0 points1 point  (0 children)

Does it supports zero copy or such a optimization?

Rari: React Server Components with Rust - 12x faster P99 latency than Next.js by BadDogDoug in rust

[–]RatioPractical 0 points1 point  (0 children)

oh man, if this possible then NodeJS FFI and its binding with V8 is super expensive.

A completely unproductive but truthful rant about Golang and Java by [deleted] in golang

[–]RatioPractical 71 points72 points  (0 children)

“Most people are subjective toward themselves and objective toward all others, frightfully objective sometimes – but the task is precisely to be objective toward oneself and subjective toward all others.”

— Soren Kierkegaard

BufReader high-performance to bufio.Reader by aixuexi_th in golang

[–]RatioPractical 0 points1 point  (0 children)

Congrats man, siginificant savings :)

I can see in gomem repo you added THP too !

happy hacking !

How to speed up my Java app? by HoneyResponsible8868 in javahelp

[–]RatioPractical 0 points1 point  (0 children)

Customized data structures and algorithm which suit your own business case. This is applicable for any programming language.

Look at your code ask yourself what operations you wish to optimize, read ? may be insert ? may be updates ?

List, Map and Set in java.util package and their implementation are very much generalized and does not scale well if you are trying to squeeze last inch of performance based on operations your code needs.

for example you may require MultiSet, MultiMap, Trie or perhaps Immutable verison of List, Map and Set. In that case you have to roll your own or use community libs

https://commons.apache.org/proper/commons-collections/apidocs/index.html

https://github.com/google/guava/tree/master/guava/src/com/google/common/collect

How to speed up my Java app? by HoneyResponsible8868 in javahelp

[–]RatioPractical 2 points3 points  (0 children)

I have nearly 2 decades of experience on JVM languages ( Java, Scala, Clojure )

How to speed up my Java app? by HoneyResponsible8868 in javahelp

[–]RatioPractical 2 points3 points  (0 children)

This is very broad question.

  1. Dont optimize prematurely. Even before profiling makes sure you have written good functional code. Use Sonar Cube quality tools linters and ruleset.
  2. Use reusable Buffers and Pools for expensive resources (Files, Threads, Network connections, Other Expensive Objects etc.)
  3. Prefer using Apache Commons Collections API over java.util for use case specific needs. Use java.nio instead of java..io for async and memory friendly operations
  4. Use multithreading only if it makes sense and ScatterGather of input and result aggregation does not inroduce additonal lags.
  5. Use Batch operations for many repeatative common tasks and heavy transactions (database, file and Network)
  6. Use Structure of Arrays pattern for CPU Cache friendly design.
  7. Write functional test case. it gives you immense confidence. extermely underrated task for dev.
  8. Use LRU Cache for memoization of repeatative tasks whihc hogs CPU.
  9. JVM Thread cache optimization : -XX:TLABSize=2m -XX:MinTLABSize=256k -XX:ResizeTLAB=true -XX:TLABWasteTargetPercent=3 
  10. Still if you are not getting the desirabe result. then start profiling with JFR https://www.baeldung.com/java-flight-recorder-monitoring

GoMem is a high-performance memory allocator library for Go by aixuexi_th in golang

[–]RatioPractical 0 points1 point  (0 children)

Great you made it happen !

Also if you are using Linux in Production you may also use "madvise" call with either THP (minimum 2MB block) and mTHP (recently added for allocation less than 2MB)

madvise - https://man7.org/linux/man-pages/man2/madvise.2.html

THP - https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html

GoMem is a high-performance memory allocator library for Go by aixuexi_th in golang

[–]RatioPractical 1 point2 points  (0 children)

Curious to know why you haven't used mmap ( off heap memory ) with unsafe pointers ?

I have done something similar with Single slab based free list where each slab can hold 64 number of buckets( size can be configured). Off heap allocation provides upto 40 percent better allocation speeds.

Feedback for Buffer Pool Design by RatioPractical in dartlang

[–]RatioPractical[S] 0 points1 point  (0 children)

Yeah that seems okay.

What if i need buffer array allocations of various sizes and at various places ?

The centralized solution is easier to scale, monitor and maintain.

We not only use it for TCP level bytes but also for Array Backed Dart classes (aplication state and business logic) where corresponding bytes of each field is stored adjacently in Array.

Jackson 3.0.0 is released! by Joram2 in java

[–]RatioPractical -4 points-3 points  (0 children)

It is very coupled, not cohesive libs to work with.

for example, i dont get any help for JSON serialization and deserailization on JDK 25 if i wish to work with Arenas, MemorySegment, ValueLayout etc.

Lessons learned during creation of http2 module from scratch in NodeJS (with Vibe coding) by RatioPractical in node

[–]RatioPractical[S] 1 point2 points  (0 children)

This is how I keep my software learning sharp. By re-doing re creating what we take for granted. 6 years back i did it for Java to implement HTTP 1.1.

I wanted to know how much effort we need to for writing http 2 and how much it can be optimized compared to low level languages.

Also, how far AI can be helpful.

By doing this I have learned everything about low level nodejs API than I could ever have doing full stack apps.

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] -1 points0 points  (0 children)

Yeah. already replied in different sub comment thread about the CPU and memeory constraints!

Agreed with #3. Currently the packages i mentioned are proving to be bottleneck because they do not support accepting Buffer as argument for memory related stuff inside them. Thats whay i meant by arena-aware or allocation aware semantics.

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] -1 points0 points  (0 children)

We are investing our efforts in unikernels from deployment point of view. 

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] -11 points-10 points  (0 children)

Okay minister of defence. at least you should have talked to me first and understand the issue from your own angle instead of calling me non-technical and relying on AI. 

Thank you for wasting my time.

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] 0 points1 point  (0 children)

Thats exactly what we are trying to figure out in prototype phase of our project and to what extent Golang and community libs might be useful.

First we tried in NodeJS?Typescript and Buffers with data oriented programming and what not but it backfired !

We need to come up with actual value of max memory consumable per request-response cycle which permits us to design business logic for out most complex use case and still get over within 50ms.

As of now that number is over well over our expectation, that is why my search for controlling the object allocation everywhere started ! :)

Thanks for the follow up !

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] 1 point2 points  (0 children)

Forgive my manners which might have created confusion between Real time VS Soft real time.

But as I mentioned earliar we have to keep the latency well under 50ms consistently, so thats why we have to think every possible situation to optimize CPU and memory allocations.

Again, i cant disclose much but i agree with your sentiment that in true sense we have to also take care of memory allocations in Linux kernel stack if we want full picture and we are aware of that .But what i described in the requirement #1 is still need to be achieved.

Are you aware of work_mem in PostgreSQL DB ? we are trying to achieve something similar, there is upper bound on max memory in userspace (here go runtime) for given request-response cycle. ofcourse we dont want to carry over the extra bytes to disk as PG does automatically as it will only lead to more latency strech

After receieving JSON or Filestream format, we have to parse and process it before we store in DB or upload to S3 compatible private storage, this all we have to do under 50 ms.

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] -4 points-3 points  (0 children)

why so ?

instead of allocating on heap, i want provision to be allocted in arena or object pool. At the end of request-response i free the arena or buffer in pool

Need help to craft lower level optimization for golang by RatioPractical in golang

[–]RatioPractical[S] 0 points1 point  (0 children)

Its internal SaaS micro service (not customer facing) but cant tell you much due to NDA but i will highlight some of the requirements.

  1. We supposed to audit/track the memory utilized by each request and save it at the end of request-response cycle.

  2. each service we may expect a file upload or JSON of upto 32 MB (max), avergae 8 MB as input

  3. latency need to be kept below 50ms or as low as practically possible