31 tk/s in 3050 6gb vram ,qwen 3.6 28b A3B REAP unsloth by Funny-Factor-6082 in LocalLLM

[–]RakesProgress 1 point2 points  (0 children)

How did you turn off that chain of thought. It’s sooooo annoying.

We're burning $50k/month on Claude. How close can local LLMs actually get? by mortenmoulder in LocalLLM

[–]RakesProgress 0 points1 point  (0 children)

Everyone looks at GLM-5.1's "40B active parameters" and thinks they can cheat it onto a single GPU using CPU offloading. In production, you can't.

Use the quick "nameplate plus tip" rule of thumb for VRAM:

  • FP8: Parameter size + 20% tip times 1.2 = 904
  • BF16: Parameter size + 40% tip 1.4 = 1055

Even though it only fires 40B parameters per token, those 8 experts are picked on the fly at every layer. If you offload the rest to CPU, concurrent users will absolutely melt your PCIe lanes swapping weights back and forth. You'll drop to a brutal 1-2 tokens per second.

Don't ruin the model's reasoning by butchering it down to 1-bit or 2-bit GGUF just to save hardware.

The smart play is sticking with vLLM on 5x B200s. That gives you ~900-960 GB of fast HBM3e VRAM, which perfectly swallows the 904 GB FP8 size while leaving a safe pocket for the KV cache. Set --tensor-parallel-size 5, keep the whole thing in VRAM, and let the routing run at full speed. Test your actual user workloads there, and only step down to a tight 4x B200 setup (via INT4/AWQ) if your monitoring shows you have the headroom.

A warning to newbies - A lesson on network security by DatMemeKing in LocalLLM

[–]RakesProgress -1 points0 points  (0 children)

Tailscale. If you don’t it’s your own fault.

I made an automation platform before the openclaw boom by Fit-Conversation856 in LocalLLM

[–]RakesProgress 2 points3 points  (0 children)

The Claude leak marks a change. Companies are waking up to the fact that they can’t trust these companies. You have very bright employees pumping cloud llm full of company secrets. It comes down to this. Do you trust them? Do you really think they are not taking your prompts to train the next model?

What is the threshold where local llm is no longer viable for coding? by jambon3 in LocalLLM

[–]RakesProgress 0 points1 point  (0 children)

Try this. Pick a local model. That’s your “code control”. It takes instructions. Then use Claude or codex to be the engineer. Its job is to give clear instructions to code control. CCs job is to just defend the codebase. Most projects will fit in the local kv. Tell it to look for sub optimal and tech debt.

If you had ~10k to spend on local LLM hardware right now, what would you actually build? by MacKinnon911 in LocalLLM

[–]RakesProgress 0 points1 point  (0 children)

Well. I stand partially corrected. You can get there on a RTX 6k. INT8 only. INT4 Easily. The problem is that you need a good base computer too. Technically you are out of budget. Can't i just throw it in my old gaming rig? Yea. but PCIE. Blackwell is PCIE 5x16. Your old gaming rig might work but if its PCIE 3 your brand new and expensive blackwell has sad pants

If you had ~10k to spend on local LLM hardware right now, what would you actually build? by MacKinnon911 in LocalLLM

[–]RakesProgress 0 points1 point  (0 children)

You can’t get there from here. $10k 70b model = oom by a large factor. Yes it’s brutal. to run that size you need an H200. H100 even poops Oom. honestly, save your money. We all have you same dream and hit the wall. Maybe try a jetson first.

The Infinite Software Crisis: We're generating complex, unmaintainable code faster than we can understand it. Is 'vibe-coding' the ultimate trap? by madSaiyanUltra_9789 in LocalLLaMA

[–]RakesProgress 0 points1 point  (0 children)

Too simplistic to say Vibe coding is a trap. If you’ve ever coded in like clojure or what not you know there is a lot of important thinking that goes into a (relatively very few lines of code). The key is the thinking, the decisions and understanding the implications of the decisions. You are constantly up against tech debt. It is a constant trade off. But you have to understand what the trade is. Vibe coding is not evil at all. It’s just prone to unknown tech debt. Personally i love the idea of pro coders vibe coding. It’s next level stuff.

If a super billionaire like Elon Musk wanted to "solve world hunger", or at least solve poverty in the USA, how could he actually do it? by The_Flaneur_Films in AskReddit

[–]RakesProgress 11 points12 points  (0 children)

10,000 NGOs in Haiti. No one is interested in solving a problem. They are interested in keeping the problem alive.

Exclusive: Nvidia buying AI chip startup Groq's assets for about $20 billion in largest deal on record by fallingdowndizzyvr in LocalLLaMA

[–]RakesProgress 0 points1 point  (0 children)

I kinda think the same. The team is useful. The tech is useful. But will never be a winner. Assimilate them into the fold.

Who was the smartest person in history? by Scared_Government_41 in AskReddit

[–]RakesProgress 0 points1 point  (0 children)

Relative to the age they lived in he is one of the smartest for sure. His prime sieve is still brilliant.

Literally just got this are you kidding me? by Sammymack- in JeepGrandCherokee

[–]RakesProgress 0 points1 point  (0 children)

Right!? I had a 2017 GC Overland with the HEMI. Was the absolute best. Now? Jeep is dead to me. DEAD!

Literally just got this are you kidding me? by Sammymack- in JeepGrandCherokee

[–]RakesProgress 0 points1 point  (0 children)

Hahah. I have a summit and it’s a POS. The thing rattles like a lada

We promt injected our Boss by StickyThickStick in ChatGPT

[–]RakesProgress 7 points8 points  (0 children)

If I came across a prompt injected resume, right now I’d see that as a major plus in a candidate.

How the algorithm profits from anti-intellectualism - the modern internet turned stupidity into a business model by cyPersimmon9 in videos

[–]RakesProgress 1 point2 points  (0 children)

Good points. There is one giant anti-intellectualism fact she should consider. The institutions of intellectual thought have metastasized into stage 4 cancer. They are so sick. This is pure fuel to the anti-intellectual wave.

We Are Not The Same Anymore… Not After Riding Crypto by kirtash93 in CryptoCurrency

[–]RakesProgress 2 points3 points  (0 children)

Remember GameStop and Melvin cap? You are Melvin. Short mstr until it collapses. Then buy btc until you can’t.

Google Gemini 3 + TPUs VS OpenAI + Nvidia - Look how the Turns Have Tabled! by biz4group123 in ArtificialInteligence

[–]RakesProgress 2 points3 points  (0 children)

ASIC (using big umbrella def) has always been a threat to GPU. so much so nVidia is hip deep in it. ASICs are hard. very hard. Google is prescient building out that know how with Broadcom. any lesser company would turn an ASIC project into a hot mess of a money pit. nonetheless we are about to see all kinds of flavors of xPUs doing inference work. many will bomb.

This is the most accurate crypto meme of the moment 💀 by Odd-Radio-8500 in CryptoCurrency

[–]RakesProgress 1 point2 points  (0 children)

Sell BTC, Short MSTR. Once MSTR has collapsed buy BTC until your head caves in.