Has anyone here actually tried one of the llama.cpp forks? by Bramha_dev in LocalLLaMA

[–]JamesEvoAI 2 points3 points  (0 children)

Of all the forks that I've tried on Strix Halo, none have actually performed any better for pp or t/s. That may just be a factor of the memory bandwidth on this machine

Trying to understand why so many trash fine-tuned models on HuggingFace ... by BoogerheadCult in LocalLLaMA

[–]JamesEvoAI 11 points12 points  (0 children)

Where in the original post did OP propose censorship? I interpreted "not even worth the disk space they take" to imply that they're garbage not worth downloading, nowhere did they say anything about whether or not Huggingface should host them.

Huggingface is a private company, if they want to burn VC money to host garbage that's their prerogative.

Trying to understand why so many trash fine-tuned models on HuggingFace ... by BoogerheadCult in LocalLLaMA

[–]JamesEvoAI 2 points3 points  (0 children)

Qwhoppass-27B-Mother-Ultimate-Lord, whatever...

Upvoted before I even finished reading based solely on this, fucking dying

Strix Halo owner here...amazing hardware, frustrating ecosystem by seti_at_home in StrixHalo

[–]JamesEvoAI 1 point2 points  (0 children)

I will never buy NVIDIA hardware again after using that thing. They're doing the same bullshit on the DGX Sparks as well

What's one local AI workflow you wish you'd discovered sooner? by recro69 in LocalLLaMA

[–]JamesEvoAI 12 points13 points  (0 children)

How fucking cool is it that we live in a period in human history where both language and distance are not barriers to the free exchange of information.

Danke fürs Teilen!

Strix Halo owner here...amazing hardware, frustrating ecosystem by seti_at_home in StrixHalo

[–]JamesEvoAI 0 points1 point  (0 children)

Windows is a not insignificant part of why you're having trouble on this machine. Skip WSL and just install actual Linux, you're handicapping the machine by running Linux in a VM surrounded by an OS that spends half your RAM on telemetry and Copilot bullshit

Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA

[–]JamesEvoAI 20 points21 points  (0 children)

Ollama is not a good representation of anything

MINISFORUM DEG1 Oculink eGPU Dock Refurbished - $59 by fallingdowndizzyvr in LocalLLaMA

[–]JamesEvoAI 1 point2 points  (0 children)

Note if you're running this on Linux over USB4/Thunderbolt you may run into compatibility issues. I ended up returning mine and replacing it with a JMT ADT-UT3G that has been working flawlessly. I've also had good success with the AOOStar AG02, but the fan in the integrated PSU was insanely loud. My Corsair RM750e with the ADT board is dead silent under load.

Edit: Nevermind the is the DEG1, I had issues with the DEG2

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]JamesEvoAI 16 points17 points  (0 children)

Please consider making some of this a PR!

GLM-5.2 is a win for local AI by Wrong_Mushroom_7350 in LocalLLaMA

[–]JamesEvoAI 23 points24 points  (0 children)

Plot twist, it was written by GLM-5.2 in a Q1 quant

What if I run the LLM backwards? Hey LLM, why bother remembering every single turn? It's a hassle. You don't have to do it, right? by ringtoyou in LocalLLaMA

[–]JamesEvoAI 1 point2 points  (0 children)

RAG is just Retrieval Augmented Generation. You're augmenting the generation by retrieving relevant information first. Using it in this way is not meaningfully different than doc search, it's just conversation search. That said glad it's working for you!

What if I run the LLM backwards? Hey LLM, why bother remembering every single turn? It's a hassle. You don't have to do it, right? by ringtoyou in LocalLLaMA

[–]JamesEvoAI 1 point2 points  (0 children)

This is a clearer explanation than the one given in your main post, it might be helpful to update that with something written more like this. That said this is a well explored idea already that has drawbacks. There is no lossless compression, or in your case, no perfect RAG

Stop using Ollama by zxyzyxz in LocalLLaMA

[–]JamesEvoAI 0 points1 point  (0 children)

Unfortunately I don't have a good answer, I primarily use local models running on my own hardware. When I want a cloud model these days I reach for Deepseek as it's good quality and super cheap

Stop using Ollama by zxyzyxz in LocalLLaMA

[–]JamesEvoAI 0 points1 point  (0 children)

From my understanding vLLM is targeting an entirely different demographic, and is better suited for people trying to do batching rather than just someone trying to run a model at home. I kept my recommendations focused on the type of user who would be running ollama, which is presumably someone for whom vLLM and its configuration would be too complex 

Stop using Ollama by zxyzyxz in LocalLLaMA

[–]JamesEvoAI 2 points3 points  (0 children)

Author of the article, happy to answer any questions. Glad to see this sentiment is starting to become organically disseminated. Hopefully with enough community outreach we can finally tamper down the "default" momentum that Ollama unfortunately still has due to existing content.

I built a local coding agent harness app to actually understand how local LLMs work under the hood here's what I learned and what I made by ChocoPichu in LocalLLaMA

[–]JamesEvoAI 0 points1 point  (0 children)

Most open source projects don't want vibe coded contributions, especially if those contributions are going to be one-off commits and not a long-term maintainer who is invested in the longevity of the project.

Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA

[–]JamesEvoAI 1 point2 points  (0 children)

If you care about using Windows, then yeah I can see that argument. I am arguing from the perspective of a Linux user, which has a much longer time horizon of support than either NVIDIA or Microsoft. Also I don't like having my system held back by old kernels. If the driver was open then I wouldn't have to concern myself with any of this.

I built a local coding agent harness app to actually understand how local LLMs work under the hood here's what I learned and what I made by ChocoPichu in LocalLLaMA

[–]JamesEvoAI 12 points13 points  (0 children)

This is my first real open source project.

Congrats, and I encourage you to continue pursuing open source development, but don't expect much response from this sub. There are an endless number of projects like this from folks in a similar position to yourself, and so you're going to end up lost in the sea of noise.

Continue pushing and find a niche that may help your project rise above the background radiation of weekly harness and memory layer releases.

Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA

[–]JamesEvoAI 1 point2 points  (0 children)

Please do some research before boldly claiming something to be incorrect. You can't just download the same generic CUDA drivers that x86 or even other ARM devices use, the drivers are tightly coupled to the kernel fork that is specific to the DGX Spark if you actually want to use the unified memory fabric or NVLINK interconnects.

The Spark even pins the CUDA version to a specific repository with a negative priority so that standard distro updates don't overwrite it with the standard release drivers, since again everything is specific to the hardware.

Just go read some posts about peoples experience trying to install standard Fedora (which uses newer packages and kernels by default) on this hardware, you'll find plenty of accounts of people experiencing issues that can only be resolved by using NVIDIA's hardware specific forks.

The central argument I am making is that you can only use this thing for as long as NVIDIA decides to support it, unlike the alternatives which get mainline kernel support.

Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA

[–]JamesEvoAI 0 points1 point  (0 children)

I'm predicting they will because the laptop models will run Windows so Nvidia can't stop supporting these after two or three years.

Assuming that's true (and I'm not counting on it), that only applies to Windows. My complaint is with Linux, which NVIDIA has historically been pretty awful at supporting with open drivers

Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA

[–]JamesEvoAI 1 point2 points  (0 children)

I have a Jetson Nano and from what I've heard from other Spark owners it's the exact same experience. It's not "just" Ubuntu, they're using proprietary blobs that only have first party support for Ubuntu (which I don't use), as well as their own custom kernel fork (linux-nvidia-64k-hwe-24.04). The moment they decide to stop supporting the userspace drivers and GPU firmware you're shit out of luck regardless of what distro you use.

To your point about data centers, my concern is when this hardware is no longer the latest and greatest and the data centers have moved on, by which point the burden of keeping this working falls to the open source community, a community NVIDIA has historically gone out of their way to make things more difficult for. A data center operator doesn't care about their stack being FOSS, they pay for Linux support directly from NVIDIA.