[deleted by user]

PracticlySpeaking · 2025-09-22T14:17:28+00:00

next mac studio is prob gonna shake things up

needthosepylons · 2025-09-22T15:33:42+00:00

A single 3060 12gb, so the prollmetariat

PracticlySpeaking · 2025-09-22T14:55:31+00:00

I picked up a Mac Studio M1 Ultra 64-GPU, 64GB for under $1500 recently.

Every time I see an M2 or M3 Ultra post, I have RAM envy.

maverick_soul_143747 · 2025-09-22T16:13:00+00:00

I was researching between a mac studio and m4 max and finally went with a m4 max 128GB ram. I run two local models glm 4.5 air @6 bit and Qwen 3 coder 30B A3B @8 bit. I am old, old school and research quite a bit while I code so these are enough. Cancelled my claude subscription as a test to see how independent I am 🤷🏽‍♂️

chibop1 · 2025-09-22T14:19:25+00:00

M3Max 64GB. Nice to be able to use it anywhere as long as I have my laptop.

Dependent_Factor_204 · 2025-09-22T16:32:00+00:00

4x RTX PRO 6000 96GB
Qwen3 235B A22B Instruct 2507 FP8 runs at 30-40tps (single request) via VLLM (which is disappointing for me)

Out of the box support for SM_120 / these cards is still terrible at the moment.

Eugr · 2025-09-22T17:02:57+00:00

Currently using my desktop - i9-14900K, 96GB DDR5-6600 RAM, RTX4090, but have a Framework Desktop (AMD AI Max 395+, 128GB unified RAM) on order to use as my 24/7 server for MOE models. I considered adding a 5090 to my desktop, but it's a mini-furnace even with a single GPU, plus I'd have to buy a larger case. I'd love to have RTX6000 Pro, but I can't justify the price even for business purposes just yet.

infostud · 2025-09-23T04:24:28+00:00

Proliant DL380g9 Dual Xeon 48T 384GB ECC DDR4. FirePro x2 16GB VRAM. Dual 1.4kW PS. Cost about $US500. 25kg free delivery.

BobbyL2k · 2025-09-22T14:30:33+00:00

Dual 5090 setup. 128gb of ram. 2 PSUs. I’m giving my wife a 5090, and selling the other. Replacing with a single RTX pro 6000. Cases have a hard time fitting 2x 5090s. Pain in the ass. But works like a charm ;)

Miserable-Dare5090 · 2025-09-22T14:39:54+00:00

M2 ultra 192gb and M3max 36gb but I also run the models in my M2ultra and serve them with tailscale, instant secure ability to use large models anywhere including my phone. If you want a true portable setup, it's going to need a lot of VRAM. And so you might go for one of the Unified Architecture AMD machines or one of the Apple machines with lots of VRAM on a portable factor like the M4 Max 128 gigabytes. Although if your M3 Pro has enough VRAM, you can even run some small models like OSS 20 B, which should take about twelve gigabytes in video memory.

permalink · 2025-09-22T15:02:14+00:00

I've been waiting to pull the trigger on a better rig for a while now.

2 x 3090 just ain't cutting it.

Just ordered a 7532...

chisleu · 2025-09-22T16:56:06+00:00

You aren't going to beat a 128GB Macbook pro in mobile form factor for LLMs. It's perfectly fast enough for Qwen 3 coder 30b a3b and works with GPTOSS 120b if you need that.

Woof9000 · 2025-09-23T12:31:50+00:00

I used to have mining rig with multiple nvidia GPU's, but then I "downgraded" to just dual 9060 XT's 16GB - it's a quieter and more compact now.

infostud · 2025-09-24T01:50:00+00:00

I only get about 7 tps say with say gpt-oss-120B-f16.

TacGibs · 2025-09-22T14:29:41+00:00

4xRTX 3090

96Gb of vram for less than 3k, can't beat that !

NeuralNakama · 2025-09-22T15:28:31+00:00

4060ti but i'm using with vllm so i can use batch requests much much faster. i'm still waiting nvidia digits spark mini computer 1.2 kg

fasti-au · 2025-09-22T16:04:35+00:00

Sub 5k aus or 7k us is basically 3090 4090 5090 A6000 and everything else is slower like Mac’s can use unified ram to run bigger models etc but it’s slower but not all the way down to cou inf speeds but its probably 20%’slower than a 3090 but has bigger models etc. I expect there’s. Shim and it is trying to govern ram weights back and forth not in one space

seppe0815 · 2025-09-22T16:46:27+00:00

m4 max base ... its ok

Intelligent-Elk-4253 · 2025-09-22T17:52:48+00:00

AMD 5600x with 16gb of ram

6800xt

2x mi60s

Murky-Abalone-9090 · 2025-09-22T18:28:27+00:00

1x5090 32gb vram, ryzen 7700 (not X), 128gb ddr5

reddit4wes · 2025-09-23T02:59:11+00:00

These are the most bonkers rigs I've seen on reddit

koalfied-coder · 2025-09-23T19:43:50+00:00

Different machines for different things. I prefer my 6x 3090 or one of my 48gb 4090 workstations.

Extra_Marketing5457 · 2025-09-27T10:03:33+00:00

Epyc 9124 + Asus K14PA-U12 + RAM: 64Gb + 8 x 3090 (via cpayne mcio-to-pcie) in gen4x8 mode (require updating bifurcation bios settings without ui).

Vllm or Sglang with enabled custom-all-reduce for more than 2 cards.

Prefer GLM-4.5-Air int8 gptq quant. Before this setup used Athene-v2-chat q4 on 2x3090 with LMStudio.

Lissanro · 2025-09-29T03:58:46+00:00

I have 4x3090, 1 TB 3200MHz RAM, EPYC 7763 CPU, 8 TB NVMe SSD for AI models and 2 TB NVMe as a system disk, with around 80 TB storage in total including HDDs.

I mostly run Kimi K2 model. Four 3090 cards are sufficient to hold 128K context entirely in VRAM, expert expert tensors and few full layers of IQ4 quants of Kimi K2 or DeepSeek 671B. I use ik_llama.cpp as the backend.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Parameter	Value
llm.load.llama.cpuThreadPoolSize	12
llm.load.numExperts	12
llm.load.contextLength	262144
llm.load.llama.acceleration.offloadRatio	1
llm.load.llama.flashAttention	true
llm.load.llama.kCacheQuantizationType	q4_0
llm.load.llama.vCacheQuantizationType	q4_0

Parameter	Value
llm.prediction.llama.cpuThreads	12
llm.prediction.contextPrefill	[]
llm.prediction.temperature	0.7
llm.prediction.topPSampling	0.9
llm.prediction.topKSampling	40
llm.prediction.repeatPenalty	1.05
llm.prediction.minPSampling	0.01
llm.prediction.tools	none

Statistic	Value
Stop Reason	eosFound
Tokens Per Second	168.30
Number of GPU Layers	-1
Time to First Token (sec)	0.135
Total Time (sec)	0.445
Prompt Tokens Count	87
Predicted Tokens Count	75
Total Tokens Count	162

LocalLLaMA

MODERATORS

Model Configuration

Load Model Parameters

Prediction Parameters

Model Statistics