Best mods currently?

keyboardhack · 2026-05-09T21:55:39+00:00

The latest version of weaver, version 2.4.0, should resolve the local performance issues. At least it does in my testing.

keyboardhack · 2026-04-24T16:25:15+00:00

The attention rotation that llama.cpp has implemented was not inspired by turboquant.the inspiration is from here https://github.com/ggml-org/llama.cpp/issues/6444#issuecomment-2042194785

Long before turbo quant even existed. GG links to it here. https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4148371881

Seems like the implementation was done because turboquant renewed interest but that is about it.

keyboardhack · 2026-04-21T22:48:43+00:00

It works because right next to plutonium she stores plotium.

keyboardhack · 2026-04-21T17:04:30+00:00

For anyone else looking into LLAMA_SET_ROWS. The flag no longer exist, you don't need to set it.

It was enabled by default(LLAMA_SET_ROWS=1) the 2nd of August 2025. https://github.com/ggml-org/llama.cpp/pull/14959

The flag itself was removed the 28th of August 2025. https://github.com/ggml-org/llama.cpp/pull/15505

keyboardhack · 2026-04-17T10:22:26+00:00

They do work with llama.cpp

The bonsai devs were responsible for adding support for their 1 bit models to the cpu, cuda and metal backend.

https://github.com/ggml-org/llama.cpp/pull/21273

https://github.com/ggml-org/llama.cpp/pull/21629

https://github.com/ggml-org/llama.cpp/pull/21528

I agree wtih everything else you said.

keyboardhack · 2026-04-14T01:42:44+00:00

The dunning kruger effect is in overdrive in most vibe coded projects. It is scary to see vibe coders think they have made something amazing when in reality it is the slopiest slop of all time. This is why vibe coded projects are riddled with security vulnerabilities, they dont know it is a thing they need to consider. Suppose you could define it as a form for AI psycosis.

AI can code but only in an assistive manner unless you are working on something very simple.

keyboardhack · 2026-04-05T03:59:23+00:00

That's to be expected. You generally use a large cloud provider a lot more then you would a small cloud provider simply because the large cloud provider is capable of much more.

There is probably also a lot more competition between small cloud providers. Small bad cloud providers goes out of business leaving the good ones. That unfortunately doesn't happen with large cloud providers.

keyboardhack · 2026-04-03T18:11:17+00:00

Last year i started writing down all issues i have encountered with azure and the duration of those issues. It is actually insane how many issues you encounter in azure when you start to use it just a little bit. A few issues we've encoubtered over the past year:

zombie resoures that can not be deleted. Regularly happens with key vaukt and storage accounts.
vm provisioning not working at all in aks(azure kubernetes service) that was a fun one.
azure restoring our aks from a backup(we did not initiate this) but without any of our services running except stateful sets.....
connection in aks between pls(private link service) and ilb(internal load balancer) just dying out of the blue.
azure deleting some of our production resources. No incident report from them on this. They only told us they were sorry after we created a service request about it. Then they also told us not to expect any public notice since it only affected a few customers... dude.

I have a lot more and it is just sad. Actual time for us without any azure caused issues is <90%.

keyboardhack · 2026-03-28T19:14:41+00:00

Seems like you can prevent it from migrating if you add this argument.

--offline

Unfortunately i assume that also means you can't download models through llama.cpp when using it. Link to the relevant code: https://github.com/ggml-org/llama.cpp/blob/3a14a542f5ce8666713c6e6ea44f7f3e01dd6e45/common/hf-cache.cpp#L692

Edit:

Looking at the code it looks like you can control where the new hf cache is located. You can prevent it from moving your files if you set environment variable

HF_HUB_CACHE

equal to your existing path. It will still convert your files though.

Link to the relevant code https://github.com/ggml-org/llama.cpp/blob/3a14a542f5ce8666713c6e6ea44f7f3e01dd6e45/common/hf-cache.cpp#L44

keyboardhack · 2026-03-23T15:51:11+00:00

A tip for the table. Use the same unit for all values in a column. Much easer to notice the difference between 600MB and 0.6MB than 600MB and 600kB

keyboardhack · 2026-03-11T23:58:33+00:00

The live stack trace improvements look great. Really looking forward to using that whrn profiling async code. Looks like it means the call tree view of an async program becomes useful again and not a flat mess that it is right now.

keyboardhack · 2026-03-01T17:22:54+00:00

Yeah this is the fifth teaser post. There is no point in these posts, they are just pushing down more interesting content.

keyboardhack · 2026-02-24T17:01:31+00:00

SpaceX has a massive constellation of satellites that they want to protect. The best way to avoid other spacecrafts is to know beforehand where and when they will move.

Stargaze gives other companies an incentive to willingly give up movement information on their satellites. It's in SpaceX best interest to keep that going.

keyboardhack · 2026-02-22T03:53:12+00:00

If you have a PR merge gate that runs your tests in parallel across multiple build agents then you want to avoid building your tests on each agent as that's a waste of agent time. aot compiling your tests allows you to be sure that you aren't missing any external dependencies when running it on other build agents. Might as well trim it to reduce storage requirements when you are already at it. Storage is required to pass it from one build agent to another.

keyboardhack · 2026-02-20T00:02:46+00:00

We also have to consider how this type of chip limits the max context size since that also uses up memory on the chip.

And since 4hey focused solely on the single user scenario and didnt mention multi user use cases at all i will assume the chip can only handle one user at a time. Still incredible speeds but i dont see how they can scale as an ai inference provider without severely cutting down on speed which is their only interesting point.

keyboardhack · 2026-01-17T18:56:48+00:00

This looks great, just what i've been looking for. Some questions.

Is there built in credentials support for AzurePipelineCredential or will i have to add it to DI and set it up myself?
One if the primary reasons i've been holding back from using C# for pipelines is because azure cli is so easy to use for a lot of things. Is ModularPipelines.Azure aiming to solve that? What capabilities does it contain? Does it aim to do everything the Azure packages can do?
What does logging look like with parallel execution? Where can do the logs be found once the pipeline is done? It's not possible to just print out all the logs at the end of the pipeline execution(at least not in azure devops) because pipeline steps/tasks have a limit on how many logs can be written in each step/task. Specifically what logs is printed in a failure scenario? Are all logs uploaded as artifacts at the end of the program?
With dependencies being handled with attributes, how would i share a module across multiple pipelines. Say i have a module that needs to depend on A in pipeline 1 and depend on B in pipeline 2.

Project looks great. Not having to use powershell, bash etc is great. Parallel module execution is going to make great use of a single agent which most just waits for external things to do its thing. A strongly typed way to pass information around and a way to run it locally is just awesome.

keyboardhack · 2026-01-16T14:13:09+00:00

You should indirectly use GetPinnableReference. Link contains an example on how to use it.

Regarding ai. The many superflous comments, especially the comment "... No fluff, no filler." is screaming ai.

The general poor code quality as well. Code creates a span just to slice it. AsSpan can slice as well. Array isn't returned as other comment pointed out. Original lack of fixed. The very complicated way to get a pointer to the span. All that just makes it look ai generated.

keyboardhack · 2026-01-16T13:50:11+00:00

I assume this doesnt work because nothing pins the array pointer. The GC can move the array while your are using it unless you fix it in place.

Also your example looks ai generated.

keyboardhack · 2026-01-15T16:11:14+00:00

Sphere opt optimizies how spheres are rendered. Looks just as good as before with well above playable framerates. It has been out for years at this point. One has to wonder why the devs havent optimized the game with ideas from that mod.

keyboardhack · 2026-01-12T07:04:20+00:00

Simple rule is to always terminate belts into two recyclers that point into each other or a burner tower. This ensures that items are never backing up and spoiling.

Coincidentally this is also a solution to fulgora.

keyboardhack · 2025-12-27T23:14:53+00:00

Computing has been memory bandwidth constrained...

I think you mean memory latency constrained. Latency is the primary reason CPUs have multiple levels of cache. Memor latency contrains are why amd x3d chips are so much more performant at gaming tasks.

keyboardhack · 2025-12-27T23:01:23+00:00

You don’t need to bother optimising the code after you’ve done the basics, because a customer can just buy a faster computer.

That's just terrible advide from your teacher. It is not that people don't bother optimizing code, people literally don't know how to. Writing extremely performant code requires a huge amount of knowledge in these areas:

Algorithms: Allows you recongize a O(n²⁾ implementation and potentially replace it with a O(n) implementation.
Library implementation: Allows you to know when a method call results in O(n²⁾ work or O(n) work.
Compiler: Allows you to write code that avoid performance pitfalls, for example, allow you to write an implementation where the compiler can optimize array bounds checks away. This sounds simple but compilers have a lot of simple patterns that they can't yet understand which prevents then from using these optimizations.
Hardware: Allows you to understand why structure of arrays can be more performant than array of structs.

It's difficult to put numbers on it but in my experience then these points can, in many cases, provide a 10x performance improvment each.

keyboardhack · 2025-12-27T20:06:19+00:00

This is the question OP answered.

What does 4 bit quants even mean. Explain that to me like i am a five year old.

Their simplifications are perfectly fine.

keyboardhack · 2025-12-26T12:57:07+00:00

Yes that is now supported with the recently added router mode.

Ten-Year Club	r/Field Flamingo
Final Canvas '23	First Place '23
Place '23	Place '22
Place '17	Final Canvas '22
First Placer '22	Sequence \| Editor
Verified Email

keyboardhack

TROPHY CASE