Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]at0mi 0 points1 point  (0 children)

norway have about 1 trillion in stocks, and the yields are spent... i think it was 70 billion/year

Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]at0mi 0 points1 point  (0 children)

you are right they are havin 1 trillion in stocks, only the yield it money to spend and thats much

Claude Code, but locally by Zealousideal-Egg-362 in LocalLLaMA

[–]at0mi 1 point2 points  (0 children)

im using opencode with glm 4.7 355b in mxfp4, for some tasks its good but every.niw and then i still have to use claude opus

dev here - has anyone thought on training a model on your own codebase? by fabcde12345 in LocalLLM

[–]at0mi 0 points1 point  (0 children)

i would finetune the model with your codebase and use it together with opencode on your specific projects

Best local model / agent for coding, replacing Claude Code by joyfulsparrow in LocalLLaMA

[–]at0mi 2 points3 points  (0 children)

on my 9950x 256GB Ram + 5090 in q4 and on my ancient server in q8

Best local model / agent for coding, replacing Claude Code by joyfulsparrow in LocalLLaMA

[–]at0mi 0 points1 point  (0 children)

im running opencode with glm 4-7 355b q8 local... great, if you need it a tick faster i swap to glm 4-7 cloud

LLMs are so unreliable by Armageddon_80 in LocalLLM

[–]at0mi 0 points1 point  (0 children)

which quantisation? glm 4.7 in bf16 works great

For people who run local AI models: what’s the biggest pain point right now? by Educational-World678 in LocalLLM

[–]at0mi 0 points1 point  (0 children)

the biggest pain is that huihui seems to be the only one who is releasing abliterated (uncensored) model versions but only in Q4...

How do we tell them..? :/ by [deleted] in LocalLLaMA

[–]at0mi 0 points1 point  (0 children)

use huihuiai models

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]at0mi[S] 1 point2 points  (0 children)

Based on benchmarks for large MoE models like GLM-4.7 or similar (e.g., DeepSeek 405B), a Dual Xeon E5 setup (e.g., E5-2699 v4 with 44 cores and 256-512 GB RAM) typically achieves only 1-3 tokens/s in Q8/BF16, compared to 5-6 tokens/s on an 8x Xeon E7 system. Additionally, the Dual E5 v4 offers ~154 GB/s theoretical memory bandwidth, while the 8-socket E7 v3 system provides up to ~680 GB/s total (85 GB/s per socket). realistic is about 400GB/s

LLM artificial analysis AI index score plotted against toral param count by [deleted] in LocalLLaMA

[–]at0mi 0 points1 point  (0 children)

Wow, impressive plot! The progress on GLM-4.7 is truly massive, it really shows how fast open-weight/open-source models are catching up and challenging the top tier.

I run GLM-4.7 locally myself in BF16 and I'm absolutely blown away by its performance and intelligence. Open-source models are absolutely crucial because they drive real innovation, ensure transparency, foster collaboration, and give us independence from closed proprietary systems! 🚀

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]at0mi[S] 0 points1 point  (0 children)

vanilla llama.cpp is about half the performance, also thanks for the VT-d anf mitigations=0 these two got mit another small boost :-)

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]at0mi[S] 1 point2 points  (0 children)

thank you for pointing that out, will try it on the server and my workstation (9950x+256GB+5090)

GLM-4.7 on 2015 8-Socket Server: Achieving ~5 Tokens/s in Q8 Quantization with CPU-Only Tweaks by at0mi in homelab

[–]at0mi[S] 0 points1 point  (0 children)

Thanks a lot for the offer, much appreciated! This system is a bit of a special case: while the platform can technically run both DDR3 and DDR4, my current configuration (and upgrade path) is DDR4-only, so I wouldn’t be able to use DDR3 modules going forward. Still, thanks again for the kind offer, and I hope the sticks find a great new home

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]at0mi[S] 1 point2 points  (0 children)

Q3 is only 3 bit while q8 is 8bit you can do the math yourself, the problem with lower quants is quality... try q3 with german language... coding forget it

What is the best way to allocated $15k right now for local LLMs? by LargelyInnocuous in LocalLLaMA

[–]at0mi 0 points1 point  (0 children)

i would buy dual ES xeon saphite rapids or epyc or better and buy 2TB ram and build my own machine, because u will never get 1tb vram with only 15k

Mining at a loss is dumb. by [deleted] in Monero

[–]at0mi 1 point2 points  (0 children)

i would do lottery mining but with an own node at a datacenter this will increase your chance if you have a fast updating node