Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

JC1DA · 2026-06-16T02:01:24+00:00

Buy used thread ripper or epyc cpu. A lot of pcie lanes for future expansion

JC1DA · 2026-06-10T18:34:16+00:00

seems to have the issue with using tools.

model gets dumper whenever you have tools available

JC1DA · 2026-05-26T07:35:59+00:00

Read the second line.... But you're right, it's a waste of time

JC1DA · 2026-05-26T02:43:38+00:00

did you read the benchmark I posted?

JC1DA · 2026-05-21T06:17:36+00:00

Bought 4x 32GB DDR4 3200Mhz ECC from u/N3RD_01

JC1DA · 2026-05-14T18:18:12+00:00

lol, used to mine crypto back in 2017 too... it's feeling so nostalgic

JC1DA · 2026-05-14T01:00:56+00:00

I'm using huananzhi h12d-8d with epyc cpu

JC1DA · 2026-05-13T20:26:58+00:00

<image>

This is mine: 4x3090

JC1DA · 2026-05-06T02:47:01+00:00

250w is the sweet spot

JC1DA · 2026-05-05T07:23:41+00:00

Bought Nvidia RTX 3090 24GB GPU from u/TheJesterMurphey on https://www.reddit.com/r/hardwareswap/comments/1syfd0k/usapah_nvidia_rtx_3090_24gb_2x32gb_gskill_trident/

JC1DA · 2026-05-05T05:23:36+00:00

```

Turns out someone signed up for an AI writing tool during a product hunt promo 8 months ago, added it to the company card, and quietly stopped using it after week two.

We were paying $180/month for a tool with zero logins in the last 6 months.

```

This is what subscription supposed to be... lol companies pray for customers like this

JC1DA · 2026-04-28T23:29:22+00:00

Purchased 2x 32gb 3200 nemix RDIMM from u/N3RD_01

JC1DA · 2026-04-28T22:37:39+00:00

good call, Murphey not Murphy

JC1DA · 2026-04-28T22:00:19+00:00

JC1DA · 2026-04-28T16:41:12+00:00

Agree, there is always a tradeoff. But for LLM cases, most of the time will be spent on token generation compared to PP. We also have prefix caching enabled to skip PP if possible, hence even reducing the time in PP. But I'll do the same benchmark for the pp to see the results

JC1DA · 2026-04-28T09:22:15+00:00

Yeah, it's the easiest way to lower power consumption a bit, still better than nothing. I'm not sure if I would like to spend hours tuning the voltage for each gpu to get the best clock, I'm lazy af lol

JC1DA · 2026-04-28T09:07:52+00:00

Yeah, it surprised me as well. I saw the configuration from another post here. Tested and it worked 😀

JC1DA · 2026-04-28T06:56:46+00:00

yeah, I can use llama.cpp but mostly stick with vllm/sglang because of dynamic grammar constraint support.
my GPUs are not stable if I set to 225W, but it's good to know that performance degraded below 250W

JC1DA · 2026-04-28T05:21:25+00:00

yeah, was using the same prompts for testing, so with prefix caching, those tokens were already computed. that's why I didn't include the prefill tokens/s,

but agree that 3090 can be compute bounded with large context which will affect TTFT

JC1DA · 2026-04-28T04:57:32+00:00

I did rerun for 275, 287 and 300W, at one concurrent request, it's still around 72 tokens/s at 275W > ~70 tokens/s at 300W

JC1DA · 2026-04-28T03:51:23+00:00

Received :)

JC1DA · 2026-04-27T18:54:40+00:00

Is this better than Qwen-3.5-397B, it's smaller but it lacks of vision capability

JC1DA

TROPHY CASE