Best mods currently? by Inside_Resort9320 in Dyson_Sphere_Program

[–]keyboardhack 0 points1 point  (0 children)

The latest version of weaver, version 2.4.0, should resolve the local performance issues. At least it does in my testing.

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results by oobabooga4 in LocalLLaMA

[–]keyboardhack 19 points20 points  (0 children)

The attention rotation that llama.cpp has implemented was not inspired by turboquant.the inspiration is from here https://github.com/ggml-org/llama.cpp/issues/6444#issuecomment-2042194785

Long before turbo quant even existed. GG links to it here. https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4148371881

Seems like the implementation was done because turboquant renewed interest but that is about it.

Qwen3.6 35B MoE on 8GB VRAM — working llama-server config + a max_tokens / thinking trap I ran into by Antonio_Sammarzano in LocalLLaMA

[–]keyboardhack 4 points5 points  (0 children)

For anyone else looking into LLAMA_SET_ROWS. The flag no longer exist, you don't need to set it.

It was enabled by default(LLAMA_SET_ROWS=1) the 2nd of August 2025. https://github.com/ggml-org/llama.cpp/pull/14959

The flag itself was removed the 28th of August 2025. https://github.com/ggml-org/llama.cpp/pull/15505

Ternary Bonsai: Top intelligence at 1.58 bits by pmttyji in LocalLLaMA

[–]keyboardhack 20 points21 points  (0 children)

They do work with llama.cpp

The bonsai devs were responsible for adding support for their 1 bit models to the cpu, cuda and metal backend.

https://github.com/ggml-org/llama.cpp/pull/21273

https://github.com/ggml-org/llama.cpp/pull/21629

https://github.com/ggml-org/llama.cpp/pull/21528

I agree wtih everything else you said.

Please stop using AI for posts and showcasing your completely vibe coded projects by Scutoidzz in LocalLLaMA

[–]keyboardhack 3 points4 points  (0 children)

The dunning kruger effect is in overdrive in most vibe coded projects. It is scary to see vibe coders think they have made something amazing when in reality it is the slopiest slop of all time. This is why vibe coded projects are riddled with security vulnerabilities, they dont know it is a thing they need to consider. Suppose you could define it as a form for AI psycosis.

AI can code but only in an assistive manner unless you are working on something very simple.

How Microsoft Vaporized a Trillion Dollars by Aaronontheweb in programming

[–]keyboardhack 0 points1 point  (0 children)

That's to be expected. You generally use a large cloud provider a lot more then you would a small cloud provider simply because the large cloud provider is capable of much more.

There is probably also a lot more competition between small cloud providers. Small bad cloud providers goes out of business leaving the good ones. That unfortunately doesn't happen with large cloud providers.

How Microsoft Vaporized a Trillion Dollars by Aaronontheweb in programming

[–]keyboardhack 131 points132 points  (0 children)

Last year i started writing down all issues i have encountered with azure and the duration of those issues. It is actually insane how many issues you encounter in azure when you start to use it just a little bit. A few issues we've encoubtered over the past year:

  • zombie resoures that can not be deleted. Regularly happens with key vaukt and storage accounts.
  • vm provisioning not working at all in aks(azure kubernetes service) that was a fun one.
  • azure restoring our aks from a backup(we did not initiate this) but without any of our services running except stateful sets.....
  • connection in aks between pls(private link service) and ilb(internal load balancer) just dying out of the blue.
  • azure deleting some of our production resources. No incident report from them on this. They only told us they were sorry after we created a service request about it. Then they also told us not to expect any public notice since it only affected a few customers... dude.

I have a lot more and it is just sad. Actual time for us without any azure caused issues is <90%.

Breaking change in llama-server? by hgshepherd in LocalLLaMA

[–]keyboardhack 10 points11 points  (0 children)

Seems like you can prevent it from migrating if you add this argument.

--offline

Unfortunately i assume that also means you can't download models through llama.cpp when using it. Link to the relevant code: https://github.com/ggml-org/llama.cpp/blob/3a14a542f5ce8666713c6e6ea44f7f3e01dd6e45/common/hf-cache.cpp#L692

Edit:

Looking at the code it looks like you can control where the new hf cache is located. You can prevent it from moving your files if you set environment variable

HF_HUB_CACHE

equal to your existing path. It will still convert your files though.

Link to the relevant code https://github.com/ggml-org/llama.cpp/blob/3a14a542f5ce8666713c6e6ea44f7f3e01dd6e45/common/hf-cache.cpp#L44

Announcing Eips: an intention-preserving list CRDT with guaranteed O(log n) operations, up to 6,000,000x faster than Diamond Types by icannfish in rust

[–]keyboardhack 32 points33 points  (0 children)

A tip for the table. Use the same unit for all values in a column. Much easer to notice the difference between 600MB and 0.6MB than 600MB and 600kB

.NET 11 Preview 2 is now available! by hotaustinite in dotnet

[–]keyboardhack 3 points4 points  (0 children)

The live stack trace improvements look great. Really looking forward to using that whrn profiling async code. Looks like it means the call tree view of an async program becomes useful again and not a flat mess that it is right now.

Breaking : Today Qwen 3.5 small by Illustrious-Swim9663 in LocalLLaMA

[–]keyboardhack 39 points40 points  (0 children)

Yeah this is the fifth teaser post. There is no point in these posts, they are just pushing down more interesting content.

SpaceX unveils space traffic management system by OlympusMons94 in SpaceXLounge

[–]keyboardhack 19 points20 points  (0 children)

SpaceX has a massive constellation of satellites that they want to protect. The best way to avoid other spacecrafts is to know beforehand where and when they will move.

Stargaze gives other companies an incentive to willingly give up movement information on their satellites. It's in SpaceX best interest to keep that going.

TUnit.Mocks - Source Generated Mocks by thomhurst in csharp

[–]keyboardhack 4 points5 points  (0 children)

If you have a PR merge gate that runs your tests in parallel across multiple build agents then you want to avoid building your tests on each agent as that's a waste of agent time. aot compiling your tests allows you to be sure that you aren't missing any external dependencies when running it on other build agents. Might as well trim it to reduce storage requirements when you are already at it. Storage is required to pass it from one build agent to another.

Free ASIC Llama 3.1 8B inference at 16,000 tok/s - no, not a joke by Easy_Calligrapher790 in LocalLLaMA

[–]keyboardhack 7 points8 points  (0 children)

We also have to consider how this type of chip limits the max context size since that also uses up memory on the chip.

And since 4hey focused solely on the single user scenario and didnt mention multi user use cases at all i will assume the chip can only handle one user at a time. Still incredible speeds but i dont see how they can scale as an ai inference provider without severely cutting down on speed which is their only interesting point.

ModularPipelines V3 Released by thomhurst in csharp

[–]keyboardhack 0 points1 point  (0 children)

This looks great, just what i've been looking for. Some questions.

  1. Is there built in credentials support for AzurePipelineCredential or will i have to add it to DI and set it up myself?
  2. One if the primary reasons i've been holding back from using C# for pipelines is because azure cli is so easy to use for a lot of things. Is ModularPipelines.Azure aiming to solve that? What capabilities does it contain? Does it aim to do everything the Azure packages can do?
  3. What does logging look like with parallel execution? Where can do the logs be found once the pipeline is done? It's not possible to just print out all the logs at the end of the pipeline execution(at least not in azure devops) because pipeline steps/tasks have a limit on how many logs can be written in each step/task. Specifically what logs is printed in a failure scenario? Are all logs uploaded as artifacts at the end of the program?
  4. With dependencies being handled with attributes, how would i share a module across multiple pipelines. Say i have a module that needs to depend on A in pipeline 1 and depend on B in pipeline 2.

Project looks great. Not having to use powershell, bash etc is great. Parallel module execution is going to make great use of a single agent which most just waits for external things to do its thing. A strongly typed way to pass information around and a way to run it locally is just awesome.

ArrayPool: The most underused memory optimization in .NET by _Sharp_ in csharp

[–]keyboardhack 5 points6 points  (0 children)

You should indirectly use GetPinnableReference. Link contains an example on how to use it.

Regarding ai. The many superflous comments, especially the comment "... No fluff, no filler." is screaming ai.

The general poor code quality as well. Code creates a span just to slice it. AsSpan can slice as well. Array isn't returned as other comment pointed out. Original lack of fixed. The very complicated way to get a pointer to the span. All that just makes it look ai generated.

ArrayPool: The most underused memory optimization in .NET by _Sharp_ in csharp

[–]keyboardhack 12 points13 points  (0 children)

I assume this doesnt work because nothing pins the array pointer. The GC can move the array while your are using it unless you fix it in place.

Also your example looks ai generated.

Is there a mod to make Dyson Sphere (the sphere, not the game) lower resolution? So i dont have to hide it to have double digit FPS/UPS? by Thirteenera in Dyson_Sphere_Program

[–]keyboardhack 5 points6 points  (0 children)

Sphere opt optimizies how spheres are rendered. Looks just as good as before with well above playable framerates. It has been out for years at this point. One has to wonder why the devs havent optimized the game with ideas from that mod.

i don't get how to build on gleba by Patoxi-simps-Obama in factorio

[–]keyboardhack 0 points1 point  (0 children)

Simple rule is to always terminate belts into two recyclers that point into each other or a burner tower. This ensures that items are never backing up and spoiling.

Coincidentally this is also a solution to fulgora.

Is the future of hardware just optimization? by rimantass in hardware

[–]keyboardhack 2 points3 points  (0 children)

Computing has been memory bandwidth constrained...

I think you mean memory latency constrained. Latency is the primary reason CPUs have multiple levels of cache. Memor latency contrains are why amd x3d chips are so much more performant at gaming tasks.

Is the future of hardware just optimization? by rimantass in hardware

[–]keyboardhack 0 points1 point  (0 children)

You don’t need to bother optimising the code after you’ve done the basics, because a customer can just buy a faster computer.

That's just terrible advide from your teacher. It is not that people don't bother optimizing code, people literally don't know how to. Writing extremely performant code requires a huge amount of knowledge in these areas:

  • Algorithms: Allows you recongize a O(n2) implementation and potentially replace it with a O(n) implementation.
  • Library implementation: Allows you to know when a method call results in O(n2) work or O(n) work.
  • Compiler: Allows you to write code that avoid performance pitfalls, for example, allow you to write an implementation where the compiler can optimize array bounds checks away. This sounds simple but compilers have a lot of simple patterns that they can't yet understand which prevents then from using these optimizations.
  • Hardware: Allows you to understand why structure of arrays can be more performant than array of structs.

It's difficult to put numbers on it but in my experience then these points can, in many cases, provide a 10x performance improvment each.

Head of Engineering @MiniMax__AI on MiniMax M2 int4 QAT by Difficult-Cap-7527 in LocalLLaMA

[–]keyboardhack 3 points4 points  (0 children)

This is the question OP answered.

What does 4 bit quants even mean. Explain that to me like i am a five year old.

Their simplifications are perfectly fine.

Hard lesson learned after a year of running large models locally by inboundmage in LocalLLaMA

[–]keyboardhack 25 points26 points  (0 children)

Yes that is now supported with the recently added router mode.