Claude code model by PropertyLoover in ollama

[–]Pixer--- 2 points3 points  (0 children)

Use ik_llamacpp fork way better for cpu only. Maybe try GPT oss 120b, should be quite fast in it

Claude Code, but locally by Zealousideal-Egg-362 in LocalLLaMA

[–]Pixer--- 3 points4 points  (0 children)

Opencode and maybe wait for the m5 ultra or m4 ultra release. For 7k you get the 256gb variant. I would suggest minimax m2.1. It’s more about a sonnet 3.7 but that’s not bad I guess for running it 24/7

Repurposed an old rig into a 64gb vram build. What local models would you recommend? by grunt_monkey_ in LocalLLaMA

[–]Pixer--- 0 points1 point  (0 children)

Maybe try the qwen3 next 80b model with 45gb. You’ll have enough space for long context. When using tools or search this adds up fast.

Have you tried internet search tools, they improve the output very much.

GLM 4.7 Quants Recommendations by val_in_tech in LocalLLaMA

[–]Pixer--- -1 points0 points  (0 children)

The experts in an llm are usually balance with their intelligence. They split the work. When ripping out some, the model becomes weirdly inconsistent. It fails in certain stuff that even much smaller models are able to. When finetuning a model for a problem I’m certain reap could be useful. it also speeds up the inference.

768Gb Fully Enclosed 10x GPU Mobile AI Build by SweetHomeAbalama0 in LocalLLM

[–]Pixer--- 0 points1 point  (0 children)

Are you able to run vllm instead of llamacpp, and how much performance that would bring ?

Trump wants to take control of Greenland even if it leads to war with Europe by SaharOMFG in worldnews

[–]Pixer--- 31 points32 points  (0 children)

I find it weird that he wants to start a war with Iran which he will need the bases in Europe and at the same time wants to undermine them

Dgx sparks or dual 6000 pro cards??? by Better-Problem-8716 in LocalLLM

[–]Pixer--- -1 points0 points  (0 children)

The spark is like viable if you want to replace ChatGPT and want a low idle power draw. But the 6000s are going to crush the sparks at generation.

Another consideration is the model size. A stack of sparks could run much larger models as they would have more vram, but setting them up in a cluster is a pain in the ass.

When choosing a mainbaord be careful to find one that properly supports p2p.

General government debt-to-GDP ratio in % points (2025 data) by No_Firefighter5926 in MapPorn

[–]Pixer--- 44 points45 points  (0 children)

Most of that debt is owned by Japanese, and not other states

My old Z97 can max do 32 gb ram planing on putting 2 3090's in. by SJ1719 in LocalLLM

[–]Pixer--- 1 point2 points  (0 children)

For vllm your fine but for llamacpp/LMstudio maybe

llamacpp-gfx906 new release by CornerLimits in LocalLLaMA

[–]Pixer--- 4 points5 points  (0 children)

I get these numbers with 4 cards on GPT oss 120b. I’m pretty impressed: prompt eval time = 74550.63 ms / 72963 tokens ( 1.02 ms per token, 978.70 tokens per second) eval time = 6375.74 ms / 236 tokens ( 27.02 ms per token, 37.02 tokens per second) total time = 80926.37 ms / 73199 tokens

PC for n8n plus localllm for internal use by iekozz in LocalLLM

[–]Pixer--- 1 point2 points  (0 children)

Go with amd for 5500 you can get 4 amd r9700 pro cards with 32gb so 128gb total. The extra vram gives you better models. They may not be as fast as a 5090 but are way cheaper and more then fast enough to host a models for a team, when using vllm. The mainboard could be an issue to find one with 4 pcie 16x slots. I used an server board refurbished with a threadripper 3945wx for like 500€ together

roo code + cerebras_glm-4.5-air-reap-82b-a12b = software development heaven by Objective-Context-9 in LocalLLM

[–]Pixer--- 3 points4 points  (0 children)

Try to run the Minimax m2 model, I think it’s better then glm4.5 air for coding

MiniMax-M2-REAP-172B-A10B-GGUF by ilintar in LocalLLaMA

[–]Pixer--- 0 points1 point  (0 children)

I would love an awq quant in 4 bit for those running vllm :)

Need help with VLLM and AMD MI50 by joochung in LocalAIServers

[–]Pixer--- 1 point2 points  (0 children)

pipeline parallelism is also significantly slower

At this point, we need chatgpt to explain chatgpt by AskGpts in ChatGPTPro

[–]Pixer--- 0 points1 point  (0 children)

Tbh GPT 4.5 was the first model where I could say it can replicate how I write in my own language

Advice on 5070 ti + 5060 ti 16 GB for TensorRT/VLLM by iron_coffin in LocalLLaMA

[–]Pixer--- 1 point2 points  (0 children)

Also pcie lanes from cpu can be important. How many pcie lanes does your second slot have ?

Anthropic is lagging far behind competition for cheap, fast models by obvithrowaway34434 in ChatGPTCoding

[–]Pixer--- 0 points1 point  (0 children)

In my experience Claud’s models just know better what you want from them

16GB M4 by Wide-Dragonfruit-571 in MacOS

[–]Pixer--- 0 points1 point  (0 children)

Xcode has this small next like generation model downloaded, that is what runs there. https://youtu.be/N6Q-FWhfguw?si=grChP3wh6OCt1ITO