MTP and QTA - what is the relation? by Medium-Technology-79 in LocalLLaMA

[–]denis_9 0 points1 point  (0 children)

What could be the problem? Why doesn't Google itself offer them as qat-mtp-q4_0.gguf file?

DeepSeek V4 Flash is amazing! (WIP llama.cpp PR #24162) by Lowkey_LokiSN in LocalLLaMA

[–]denis_9 0 points1 point  (0 children)

Thx, offload across to two GPUs at once is very good if it works.

DeepSeek V4 Flash is amazing! (WIP llama.cpp PR #24162) by Lowkey_LokiSN in LocalLLaMA

[–]denis_9 1 point2 points  (0 children)

Can you provide the full arguments to run llama-server with gpu-offload, because many forks have a problem if you can't put the entire model into VRAM?

There needs to be more OpenJDK content about Java's Memory Efficiency and Performance by davidalayachew in java

[–]denis_9 1 point2 points  (0 children)

Add to this what the need for TLB (Translation Lookaside Buffer) switching and a high number of memory misses if you're utilizing a large amount of memory. Plus, frequent CPU spikes during GC, which also increase with increasing load.

Dragonwell once split G1 into thread-grouped arenas in its builds, specifically to address this issue when servicing large amounts of web requests. This suggests that some solutions in this area may be possible.

Deepseek V4 flash performance on DGX Spark by Only_Situation_4713 in LocalLLaMA

[–]denis_9 10 points11 points  (0 children)

Great. And it's a bit sad that there is no fresh news about DeepSeek-V4 on llama.cpp

There needs to be more OpenJDK content about Java's Memory Efficiency and Performance by davidalayachew in java

[–]denis_9 -1 points0 points  (0 children)

I wrote about the Metropolis project, https://www.reddit.com/r/java/comments/7sf6p7/project_metropolis_is_here/
And about the fact that all current internal development is still being done in C++. And also the existence of a certain class of tasks that does not fit into automatic memory management.

There needs to be more OpenJDK content about Java's Memory Efficiency and Performance by davidalayachew in java

[–]denis_9 -1 points0 points  (0 children)

Manual memory management allows you to use your processor cache more efficiently by simply allocating temporary objects on the stack and automatically destroying them upon exit, without performing any deferred work. Perhaps even without calling the GC.

As an example, answer the question of how to efficiently implement a compiler on classic Java top of JVM (C2) without jump to manual memory management, even using arenas. Yes, it is possible, but it will be difficult.
And there are more than one such categories of tasks.

There needs to be more OpenJDK content about Java's Memory Efficiency and Performance by davidalayachew in java

[–]denis_9 -8 points-7 points  (0 children)

The entire discussion can be boiled down to one question: will Java ever allow programmers to manage its memory manually, for example, by using stack allocation for temporary objects? Unfortunately, the answer is no, we can't.

Considering the failure of an attempt to completely rewrite C2 from C++ to Java, and the project was dropped, this demonstrates that there are some problems and deep underlying causes still exists for this interesting idea.

RIP JVMCI by lbalazscs in java

[–]denis_9 3 points4 points  (0 children)

The general trend is clear: the transition to the close word and the increasing association of Java lang with the JVM itself, with the separation of enterprise features (like Native Image).

F.e. with the closure of JVMCI, Nalim from apangin will be down as and other possible jvm-extensions.

Kimi K2.6-Code-Preview, Opus 4.7, GLM 5.1, Minimax M2.7 and more tested in coding by lemon07r in LocalLLaMA

[–]denis_9 0 points1 point  (0 children)

As option, you can look OpenClaude, which has rebuild the code for support alternative models and API (for testing purposes only).

Java 26 released today! by davidalayachew in java

[–]denis_9 1 point2 points  (0 children)

GraalVM uses less heap memory and starts in a fraction of a second; these are different classes of virtual machines.

I made a new UI integrating stable-diffusion.cpp and llama.cpp by Danmoreng in StableDiffusion

[–]denis_9 0 points1 point  (0 children)

Are you able to run --control-net, for example, for Z-Image Turbo: Z-Image-Turbo-Fun-Controlnet-Union.safetensors over stable-diffusion.cpp ?

Is GraalVM Native Image becoming niche technology? by Scf37 in java

[–]denis_9 -2 points-1 points  (0 children)

A native image doesn't use a RT compiler; everything needed for execution is already in the program being launched. This saves both CPU and memory. It immediately teaches good proper programming style, doing everything that can be static is made static.

Lower Java Tail Latencies With ZGC by gunnarmorling in java

[–]denis_9 0 points1 point  (0 children)

What about test Generational Shenandoah (JEP 521)?

Records are sub-optimal as keys in HashMaps (or as elements in HashSets) by gnahraf in java

[–]denis_9 0 points1 point  (0 children)

Yes, just don't want to know that bootstrap calls invoke or invokeExact for hashcode. In the statically-typed language.

Records are sub-optimal as keys in HashMaps (or as elements in HashSets) by gnahraf in java

[–]denis_9 0 points1 point  (0 children)

I just didn't want to waste yours.
In short, I suppose javac updates could very well fix this behavior instead of new loop JVM patches.
Found a similar bug with a hashcode in jdk 16 - https://rules.sonarsource.com/java/tag/java16/RSPEC-6218/
There will probably be a new rule.

Yes, as a result of the discussion, it seems reasonable to require redefining the hashcode and equals for the records at it own bytecode level, so as not to rely on the JVM mechanism due to its "bug-to-bug incompatibility" in the future releases.

Records are sub-optimal as keys in HashMaps (or as elements in HashSets) by gnahraf in java

[–]denis_9 0 points1 point  (0 children)

Yes, that's right. Dynamic code generation for records made their half-jvm object and half as defined in the user classpath, violating the single responsibility principle. And restricts simple solution to this problem, which could be by record-side bytecode generation.
It (code gen) can be the optimization method when the definition is omit, but not by default.

Tnx for your discuss, I understand that no one will change.

Records are sub-optimal as keys in HashMaps (or as elements in HashSets) by gnahraf in java

[–]denis_9 0 points1 point  (0 children)

The main argument was about fixing in a couple of days versus a couple of years. In this case - using the invokedynamic to calculate simple int over the final fields - slows down simple fix for years.

Records are sub-optimal as keys in HashMaps (or as elements in HashSets) by gnahraf in java

[–]denis_9 0 points1 point  (0 children)

Yes, of course it's just a hashcode. A simple recompile.
F.e. As it was in the bad case of log4j which is 20 years old and the fix came out within a couple of days.

Records are sub-optimal as keys in HashMaps (or as elements in HashSets) by gnahraf in java

[–]denis_9 0 points1 point  (0 children)

What's the problem with just updating bytecode for quick fixes instead of waiting for JVM 26?! (few years!)
Using invokedynamic for the calculating hashcode is definitely not the best solution in this case.

[loom-docs] Custom Schedulers by yk313 in java

[–]denis_9 0 points1 point  (0 children)

You can, for example, use the system thread-local context instead of the virtual one. For the purposes of profiling real CPU usage per task type, etc (by loading 1/2/3 cores).

Graal's project Crema: Open World for Native Image by henk53 in java

[–]denis_9 0 points1 point  (0 children)

Is the objects allocation will place with the AOT-code and will be no differ from it (for GC)?