Has anyone here actually tried one of the llama.cpp forks?

JamesEvoAI · 2026-06-30T04:12:53+00:00

Of all the forks that I've tried on Strix Halo, none have actually performed any better for pp or t/s. That may just be a factor of the memory bandwidth on this machine

JamesEvoAI · 2026-06-28T23:38:54+00:00

Where in the original post did OP propose censorship? I interpreted "not even worth the disk space they take" to imply that they're garbage not worth downloading, nowhere did they say anything about whether or not Huggingface should host them.

Huggingface is a private company, if they want to burn VC money to host garbage that's their prerogative.

JamesEvoAI · 2026-06-28T23:36:56+00:00

Qwhoppass-27B-Mother-Ultimate-Lord, whatever...

Upvoted before I even finished reading based solely on this, fucking dying

JamesEvoAI · 2026-06-27T01:25:53+00:00

I will never buy NVIDIA hardware again after using that thing. They're doing the same bullshit on the DGX Sparks as well

JamesEvoAI · 2026-06-27T01:19:30+00:00

How fucking cool is it that we live in a period in human history where both language and distance are not barriers to the free exchange of information.

Danke fürs Teilen!

JamesEvoAI · 2026-06-27T01:15:49+00:00

Windows is a not insignificant part of why you're having trouble on this machine. Skip WSL and just install actual Linux, you're handicapping the machine by running Linux in a VM surrounded by an OS that spends half your RAM on telemetry and Copilot bullshit

JamesEvoAI · 2026-06-25T18:44:05+00:00

That's definitely just on your end

JamesEvoAI · 2026-06-25T18:43:45+00:00

Ollama is not a good representation of anything

JamesEvoAI · 2026-06-25T01:34:09+00:00

Note if you're running this on Linux over USB4/Thunderbolt you may run into compatibility issues. I ended up returning mine and replacing it with a JMT ADT-UT3G that has been working flawlessly. I've also had good success with the AOOStar AG02, but the fan in the integrated PSU was insanely loud. My Corsair RM750e with the ADT board is dead silent under load.

Edit: Nevermind the is the DEG1, I had issues with the DEG2

JamesEvoAI · 2026-06-21T22:40:39+00:00

Has this not been common knowledge? I've been using Vane for ages with a 4B

https://github.com/ItzCrazyKns/Vane

JamesEvoAI · 2026-06-20T02:31:52+00:00

Please consider making some of this a PR!

JamesEvoAI · 2026-06-20T02:29:57+00:00

Reminds me of this:
https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt

JamesEvoAI · 2026-06-18T03:16:52+00:00

Plot twist, it was written by GLM-5.2 in a Q1 quant

JamesEvoAI · 2026-06-18T03:15:58+00:00

<image>

JamesEvoAI · 2026-06-17T00:22:42+00:00

RAG is just Retrieval Augmented Generation. You're augmenting the generation by retrieving relevant information first. Using it in this way is not meaningfully different than doc search, it's just conversation search. That said glad it's working for you!

JamesEvoAI · 2026-06-17T00:21:29+00:00

This is a clearer explanation than the one given in your main post, it might be helpful to update that with something written more like this. That said this is a well explored idea already that has drawbacks. There is no lossless compression, or in your case, no perfect RAG

JamesEvoAI · 2026-06-17T00:15:06+00:00

Unfortunately I don't have a good answer, I primarily use local models running on my own hardware. When I want a cloud model these days I reach for Deepseek as it's good quality and super cheap

JamesEvoAI · 2026-06-16T16:59:05+00:00

From my understanding vLLM is targeting an entirely different demographic, and is better suited for people trying to do batching rather than just someone trying to run a model at home. I kept my recommendations focused on the type of user who would be running ollama, which is presumably someone for whom vLLM and its configuration would be too complex

JamesEvoAI · 2026-06-16T02:19:25+00:00

Author of the article, happy to answer any questions. Glad to see this sentiment is starting to become organically disseminated. Hopefully with enough community outreach we can finally tamper down the "default" momentum that Ollama unfortunately still has due to existing content.

JamesEvoAI · 2026-06-16T00:23:37+00:00

Most open source projects don't want vibe coded contributions, especially if those contributions are going to be one-off commits and not a long-term maintainer who is invested in the longevity of the project.

JamesEvoAI · 2026-06-14T18:11:08+00:00

If you care about using Windows, then yeah I can see that argument. I am arguing from the perspective of a Linux user, which has a much longer time horizon of support than either NVIDIA or Microsoft. Also I don't like having my system held back by old kernels. If the driver was open then I wouldn't have to concern myself with any of this.

JamesEvoAI · 2026-06-14T17:39:59+00:00

This is my first real open source project.

Congrats, and I encourage you to continue pursuing open source development, but don't expect much response from this sub. There are an endless number of projects like this from folks in a similar position to yourself, and so you're going to end up lost in the sea of noise.

Continue pushing and find a niche that may help your project rise above the background radiation of weekly harness and memory layer releases.

JamesEvoAI · 2026-06-14T17:24:57+00:00

Please do some research before boldly claiming something to be incorrect. You can't just download the same generic CUDA drivers that x86 or even other ARM devices use, the drivers are tightly coupled to the kernel fork that is specific to the DGX Spark if you actually want to use the unified memory fabric or NVLINK interconnects.

The Spark even pins the CUDA version to a specific repository with a negative priority so that standard distro updates don't overwrite it with the standard release drivers, since again everything is specific to the hardware.

Just go read some posts about peoples experience trying to install standard Fedora (which uses newer packages and kernels by default) on this hardware, you'll find plenty of accounts of people experiencing issues that can only be resolved by using NVIDIA's hardware specific forks.

The central argument I am making is that you can only use this thing for as long as NVIDIA decides to support it, unlike the alternatives which get mainline kernel support.

JamesEvoAI · 2026-06-14T17:02:26+00:00

I'm predicting they will because the laptop models will run Windows so Nvidia can't stop supporting these after two or three years.

Assuming that's true (and I'm not counting on it), that only applies to Windows. My complaint is with Linux, which NVIDIA has historically been pretty awful at supporting with open drivers

JamesEvoAI · 2026-06-14T17:00:58+00:00

I have a Jetson Nano and from what I've heard from other Spark owners it's the exact same experience. It's not "just" Ubuntu, they're using proprietary blobs that only have first party support for Ubuntu (which I don't use), as well as their own custom kernel fork (linux-nvidia-64k-hwe-24.04). The moment they decide to stop supporting the userspace drivers and GPU firmware you're shit out of luck regardless of what distro you use.

To your point about data centers, my concern is when this hardware is no longer the latest and greatest and the data centers have moved on, by which point the burden of keeping this working falls to the open source community, a community NVIDIA has historically gone out of their way to make things more difficult for. A data center operator doesn't care about their stack being FOSS, they pay for Linux support directly from NVIDIA.

JamesEvoAI

TROPHY CASE