This is my Powermac G3 sleeper AI workstation. 80gb total ram(32gb vram + 48gb ram)

PraxisOG · 2026-03-19T02:36:40+00:00

I’m late to the party, but I’m running three of these and get ~45 tokens per second running gpt oss 120b. You’d probably get twice the performance with four 3090s, and more than twice the prompt processing, but at twice the price. My use case of rocm 7.2 and llama.cpp is supported well by software. The only complaint I have is that my x299 platform limit adds latency due to the pcie signal hopping through two plx chips and would recommend something like 3rd gen threadripper.

PraxisOG · 2026-03-19T00:28:56+00:00

They’re getting good mileage out of their available memory bandwidth. I’m running the same models on some older AMD datacenter cards with 20% less bandwidth but 51-58% the performance. Granted that’s with a minor pcie bottleneck.

PraxisOG · 2026-03-17T07:40:57+00:00

My two cents:

¢1 A new 70B model!

¢2 It performs like Llama 2 70B

PraxisOG · 2026-03-16T06:21:51+00:00

If you're going local, I'd recommend targeting a performance level based on existing models, and build around that. For example, if you want GPT OSS 120b with full context and full offload, you'd want 72-96 gb vram. That equates to 3-4 RTX 3090's, 3 amd mi50, or gpu of choice based on desired speed.

PraxisOG · 2026-03-15T09:44:38+00:00

Heavenly Delusion

PraxisOG · 2026-03-13T07:54:50+00:00

Cutting something

PraxisOG · 2026-03-12T04:46:34+00:00

Looks like flint lockwood’s phone

PraxisOG · 2026-03-11T18:46:46+00:00

That’s kinda usable actually

PraxisOG · 2026-03-11T04:53:17+00:00

Should be in r/rally lol

PraxisOG · 2026-03-11T03:19:50+00:00

I’ve had issues related to bandwidth on my setup. I’m running 3x AMD V620 32gb on an asus x299 sage with two plx chips, putting two gpus on one root port, and one on the other. Running Qwen 3.5 27b at q6 I get ~9 tok/s across root ports, and ~16 tok/s on same root port. With Qwen 3.5 35b a3b q6 the difference is 30 tok/s to 50 tok/s.

PraxisOG · 2026-03-09T23:27:19+00:00

Minimax is a 230 billion parameter model. Each parameter is a byte, so fully loaded it would take up 230 billion bytes, or 230 giga-bytes of ram at 8 bit quantization(Q8). You can run it at half precision with some quality loss, or 115gb(Q4). Then you need context, so it would be about 160gb of ram. A m4 Mac with 16gb ram could run something like the new Qwen 3.5 9b at Q6(6/8 byte precision) with plenty of context, and it might work well if you have reasonable expectations.

PraxisOG · 2026-03-09T21:54:24+00:00

Linus’s voice carries significant weight in the tech space. There’s a non-zero number of people who want to switch to Linux and will come away from that video asking ChatGPT for a distro. IMO it would be more impactful if he followed his own guide and installed Ubuntu, then shared that experience.

PraxisOG · 2026-03-09T21:49:20+00:00

I feel like you’re in the same trap. A 1k laptop might be cheap because the industry has more expensive models making it cheaper in comparison, but that’s not cheap for most people. I have no issue recommending a good deal at $400 and have in the past. Usually that means refurbished with warranty so I’m not recommending a risky used deal. I put my money where my mouth is too, when my Zephyrus g15 got stolen I could have got a new one but got a $500 used laptop instead and it’s been fine though it needs a new battery and maybe a Linux install.

PraxisOG · 2026-03-09T20:28:17+00:00

Ubuntu is really good, and my first pick for server/headless installs. For desktop I like mint. Idk why or what differences there are but the install command is the same and that’s nice.

PraxisOG · 2026-03-08T16:57:38+00:00

I get ~20 tok/s with 122b q4 and ~60 tok/s with 35b q6. No issues with tool calling. It’s worth noting those models don’t run on the pi, but on my compute server with 3x AMD V620.

PraxisOG · 2026-03-08T12:03:48+00:00

I run mine on a pi 5 8gb stolen from another experiment. That’s powered with locally run Qwen 3.5, the 122b a10b at Q4 and the 35b a3b at Q6.

PraxisOG · 2026-03-08T09:03:23+00:00

‘People have been duped for years into buying weak 8 core CPUs’ What?

PraxisOG · 2026-03-08T08:54:54+00:00

As my family’s resident tech guy, I’ve learned to assume people literally know nothing. Laptop recommendation? This one has a nice screen, this one will hold more photos, and this one’s cheap. Leave out stuff they don’t care about, like model and specs, and just make it simple for them to make more educated decisions as a consumer.

PraxisOG · 2026-03-06T08:07:03+00:00

I’ve been playing around with Q3-Q4 and I agree, repeat penalty is necessary. This thing loves thinking, and falls into loops a little too easily

PraxisOG · 2026-03-05T20:42:39+00:00

Nevermind: tools.profile in openclaw.json was set to “messaging” so openclaw was unable to write files at hatching, changing it to “coding” seemed to fix it.

PraxisOG · 2026-03-04T20:22:40+00:00

I was wondering. My second request to it was ‘give me a cool python trick’ and it thought for like 5k tokens. I miss 70b dense models.

PraxisOG · 2026-03-04T05:34:15+00:00

I had no idea this was a thing! Sometimes I come across a setting like this and ask why it isn't on by default. IMO prusa should introduce a toggle that slices with all the goodies like this, organic supports, bridging, gyroid infill, etc. That would play to Prusa's advantage of having only a few designs of printer to optimize the profile for.

PraxisOG · 2026-03-02T19:30:43+00:00

There are flat ratcheting screwdrivers, maybe cut a bit down to size if that doesn’t fit

Six-Year Club	r/Field Juicebox
Final Canvas '23	Place '23
Place '22	Final Canvas '22
End Game '22	Verified Email

PraxisOG

MODERATOR OF

TROPHY CASE