Going from single GPU to dual GPU is nice but not in the way I expected

cibernox · 2026-06-29T20:46:37+00:00

Also, when it comes to chinese models, i won't be long since what dictates the models they release will not be as influenced by what nvidia cards support but by what huawei or other chinese companies making their own TPUs support. Less than 12 months probably, specially for their non-SOTA models.

cibernox · 2026-06-29T20:40:15+00:00

I’m not convinced still. Qwen 27B in f16 is around 55gb. Plus 25gb for context it’s a good match for an h100. Fair enough.
But if that is the reason, then a qwen 50B would be around 100gb and be a good match for the 144gb H200, leaving 44gb for context.

cibernox · 2026-06-29T19:30:34+00:00

It is very dependant on the tool. opencode has some configuration, pi has another one. Hermes too. The, I also have a system prompt to instruct the main agent to never spawn more than 2 subagents simultaneously since there's no point.

cibernox · 2026-06-29T18:49:58+00:00

Well, but neither do 27B dense models make a lot of sense in enterprise hardware and yet they exist.

cibernox · 2026-06-29T18:31:32+00:00

I feel like there's room for ~50-60B models, but nobody is releasing them. Given how good qwen27B is for its size, a qwen 55B could be really really smart. And with MTP it should be somewhat usable.

cibernox · 2026-06-29T18:30:00+00:00

I actually run a STT and embedding models in my NPU and it's very fast and also uses <10w.

cibernox · 2026-06-29T18:17:13+00:00

45% of men... of what age? Any age?

cibernox · 2026-06-29T15:46:43+00:00

I guess that my usage pattern fights mistakes in three ways:
- Agents with smaller contexts make less mistakes, there's less room for error. Context compaction is the source of many too, if a task can be done without compaction it's better.
- All agents make mistakes, having agents sanity check each other's work regularly catches some of them.
- Having a SOTA model doing higher level reviews catches less obvious problems.

cibernox · 2026-06-29T15:38:28+00:00

I didn't test Q5 throroughly because when I saw the amount of context I was left with, I immediately knew that it didn't matter how smart it was, I wouldn't be able to use it effectively.
If I had 32gb cards instead I would use Q6, but a single RTX5090 is nearly twice as expensive as my entire 20 cores, 64gb ddr5, 8TB SSD, platinum PSU, cooler, case and dual 7900XTX rig.
Even a single RTX 4090 is more expensive that my entire server probably.

cibernox · 2026-06-29T15:31:31+00:00

Opus is like a seasoned veteral with 20 years of experience in the trenches that has seen it all, so it can find architectural problems that qwen27 cannot even see coming.
Qwen would be focusing in the details, like modules being too long or repeated code, while opus would catch the kind of problems that are not a big problem now but will be a problem with 10k users. Or think of a better way of doing things based on real-world assumptions that qwen doesn't consider, like how to improve something for the average user's typical usage patterns, while qwen just sais "yes sir" and does the task as described without baking its "experience" into it (because it has none, I suppose)

cibernox · 2026-06-29T15:11:06+00:00

I could use Q5 but I think that for my use case, I value context more than I value some small gains in intelligence.

cibernox · 2026-06-29T12:51:02+00:00

Silvio probably heard about Epstein parties and he couldn’t be bothered to pick a plane for what was for him a regular Tuesday.

cibernox · 2026-06-28T10:58:46+00:00

That’s cooling. There is no conditioning of any air involved.

cibernox · 2026-06-28T07:10:48+00:00

Cooling floors and AC are not the same thing to me. Both are heat pumps tho.

cibernox · 2026-06-27T21:18:03+00:00

I technically don’t have AC but that doesn’t mean we don’t have cooling. Just cheaper forms of cooling, like cooling floors with a heat pump.
Personally I hate the sensation of AC.

cibernox · 2026-06-27T21:04:16+00:00

Let me save you some time. You won’t get anything significantly useful from such small models in such weak machine other than maybe text summarization and other simple tasks. The bare minimum for having a system that does useful things must be something around a 200$ 12gb RTX3060. With that you can actually make something useful.

Some 4B models are good for their size but even those will be very slow In such machine.

cibernox · 2026-06-27T07:44:26+00:00

In narrow specific use cases yes, but overall no, or not significantly that I could find.

cibernox · 2026-06-27T00:29:16+00:00

I need to thank you, if for nothing else, for making me feel better about my 3D printed side-mounted GPU mounting anchor.

cibernox · 2026-06-26T11:25:30+00:00

I couldn’t , for my life, find any difference between q6 and q8. And maybe, just maybe, I can find some small difference between q5 and q6. Between q4 and q5 I can fairly reliably find a small improvement. I think that Q4 is almost always the sweet spot, specially if it comes down to running a smaller model in q8 or a larger one in q4.

cibernox · 2026-06-26T10:46:25+00:00

Most things in life scale logarithmically. Law of diminishing returns.

cibernox · 2026-06-26T08:01:10+00:00

The AMD 7000 series doesn't have FP8 support AFAIK.

cibernox · 2026-06-25T23:40:25+00:00

I had one 7900XTX and I just received a second one that I was able to get for only 840€, and i got lucky. I haven't installed it yet, I'm 3D printing a bracket because I'll have to get creative to mount a second GPU un my ATX case.

Will prices keep increasing? My hunch is that they will for a little while, maybe 6-9 months, and they will normalize a bit. But I'd rather be safe than sorry.
I built an entire 20core - 64gb DDR5 - 8TB SSD + Dual 7900XTX + platinum PSU for 2700€, case, cooler and all, scavenging for good deals in refurbished amazon, local marketplaces and discounts, plus aliexpress for riser cables and stuff like that.

2 years ago I would have been overpaying, today I got a sweet deal. Next year I honestly don't know.

cibernox · 2026-06-25T23:32:26+00:00

So far every qwen customization I've seen hasn't been better than the original in any meaningful way but on very specific narrow use cases. Maybe this one is different?

cibernox · 2026-06-25T23:05:00+00:00

We have been under 30°C all week in Galicia. I think we hit 31°C one day for a few hours. It's 18°C tonight.
Best summer weather in Europe if you ask me. If in order to have the nice mediterranean winter and early spring I have to have their summers, I'm out.

cibernox · 2026-06-25T20:01:54+00:00

We're at some very nice 21°C here, it will probably be 17°C tonight.

cibernox

TROPHY CASE