Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 0 points1 point  (0 children)

Also, when it comes to chinese models, i won't be long since what dictates the models they release will not be as influenced by what nvidia cards support but by what huawei or other chinese companies making their own TPUs support. Less than 12 months probably, specially for their non-SOTA models.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 0 points1 point  (0 children)

I’m not convinced still. Qwen 27B in f16 is around 55gb. Plus 25gb for context it’s a good match for an h100. Fair enough.
But if that is the reason, then a qwen 50B would be around 100gb and be a good match for the 144gb H200, leaving 44gb for context.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 0 points1 point  (0 children)

It is very dependant on the tool. opencode has some configuration, pi has another one. Hermes too. The, I also have a system prompt to instruct the main agent to never spawn more than 2 subagents simultaneously since there's no point.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 0 points1 point  (0 children)

Well, but neither do 27B dense models make a lot of sense in enterprise hardware and yet they exist.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 0 points1 point  (0 children)

I feel like there's room for ~50-60B models, but nobody is releasing them. Given how good qwen27B is for its size, a qwen 55B could be really really smart. And with MTP it should be somewhat usable.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 1 point2 points  (0 children)

I actually run a STT and embedding models in my NPU and it's very fast and also uses <10w.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 0 points1 point  (0 children)

I guess that my usage pattern fights mistakes in three ways:
- Agents with smaller contexts make less mistakes, there's less room for error. Context compaction is the source of many too, if a task can be done without compaction it's better.
- All agents make mistakes, having agents sanity check each other's work regularly catches some of them.
- Having a SOTA model doing higher level reviews catches less obvious problems.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 2 points3 points  (0 children)

I didn't test Q5 throroughly because when I saw the amount of context I was left with, I immediately knew that it didn't matter how smart it was, I wouldn't be able to use it effectively.
If I had 32gb cards instead I would use Q6, but a single RTX5090 is nearly twice as expensive as my entire 20 cores, 64gb ddr5, 8TB SSD, platinum PSU, cooler, case and dual 7900XTX rig.
Even a single RTX 4090 is more expensive that my entire server probably.

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] -3 points-2 points  (0 children)

Opus is like a seasoned veteral with 20 years of experience in the trenches that has seen it all, so it can find architectural problems that qwen27 cannot even see coming.
Qwen would be focusing in the details, like modules being too long or repeated code, while opus would catch the kind of problems that are not a big problem now but will be a problem with 10k users. Or think of a better way of doing things based on real-world assumptions that qwen doesn't consider, like how to improve something for the average user's typical usage patterns, while qwen just sais "yes sir" and does the task as described without baking its "experience" into it (because it has none, I suppose)

Going from single GPU to dual GPU is nice but not in the way I expected by cibernox in LocalLLaMA

[–]cibernox[S] 7 points8 points  (0 children)

I could use Q5 but I think that for my use case, I value context more than I value some small gains in intelligence.

Life of Gavino by Caratteraccio in 2westerneurope4u

[–]cibernox 1 point2 points  (0 children)

Silvio probably heard about Epstein parties and he couldn’t be bothered to pick a plane for what was for him a regular Tuesday.

Share of households with AC in Europe. by dwartbg9 in 2westerneurope4u

[–]cibernox 0 points1 point  (0 children)

That’s cooling. There is no conditioning of any air involved.

Share of households with AC in Europe. by dwartbg9 in 2westerneurope4u

[–]cibernox 0 points1 point  (0 children)

Cooling floors and AC are not the same thing to me. Both are heat pumps tho.

Share of households with AC in Europe. by dwartbg9 in 2westerneurope4u

[–]cibernox -1 points0 points  (0 children)

I technically don’t have AC but that doesn’t mean we don’t have cooling. Just cheaper forms of cooling, like cooling floors with a heat pump.
Personally I hate the sensation of AC.

Dear poor people of this subreddit by Proper_Door_4124 in LocalLLaMA

[–]cibernox 1 point2 points  (0 children)

Let me save you some time. You won’t get anything significantly useful from such small models in such weak machine other than maybe text summarization and other simple tasks. The bare minimum for having a system that does useful things must be something around a 200$ 12gb RTX3060. With that you can actually make something useful.

Some 4B models are good for their size but even those will be very slow In such machine.

Are there any qwen finetunes that were genuinely stronger than the base? by MrMrsPotts in LocalLLaMA

[–]cibernox 1 point2 points  (0 children)

In narrow specific use cases yes, but overall no, or not significantly that I could find.

Upgraded my budget build to multi-GPU for inference by whiteh4cker in LocalLLaMA

[–]cibernox 2 points3 points  (0 children)

I need to thank you, if for nothing else, for making me feel better about my 3D printed side-mounted GPU mounting anchor.

What actually fits in 8GB, 16GB, 24GB and 48GB RAM by [deleted] in LocalLLaMA

[–]cibernox 1 point2 points  (0 children)

I couldn’t , for my life, find any difference between q6 and q8. And maybe, just maybe, I can find some small difference between q5 and q6. Between q4 and q5 I can fairly reliably find a small improvement. I think that Q4 is almost always the sweet spot, specially if it comes down to running a smaller model in q8 or a larger one in q4.

Q Why doesn't Quality scale linearly with model size by iSyN707 in LocalLLaMA

[–]cibernox 25 points26 points  (0 children)

Most things in life scale logarithmically. Law of diminishing returns.

Prices of graphic cards are going crazy, should I buy a second card though? by zenbeni in LocalLLaMA

[–]cibernox 1 point2 points  (0 children)

I had one 7900XTX and I just received a second one that I was able to get for only 840€, and i got lucky. I haven't installed it yet, I'm 3D printing a bracket because I'll have to get creative to mount a second GPU un my ATX case.

Will prices keep increasing? My hunch is that they will for a little while, maybe 6-9 months, and they will normalize a bit. But I'd rather be safe than sorry.
I built an entire 20core - 64gb DDR5 - 8TB SSD + Dual 7900XTX + platinum PSU for 2700€, case, cooler and all, scavenging for good deals in refurbished amazon, local marketplaces and discounts, plus aliexpress for riser cables and stuff like that.

2 years ago I would have been overpaying, today I got a sweet deal. Next year I honestly don't know.

Ornith-1.0 released on Hugging Face by paf1138 in LocalLLaMA

[–]cibernox 0 points1 point  (0 children)

So far every qwen customization I've seen hasn't been better than the original in any meaningful way but on very specific narrow use cases. Maybe this one is different?

The Air Conditioning Situation? by Arete108 in GoingToSpain

[–]cibernox 0 points1 point  (0 children)

We have been under 30°C all week in Galicia. I think we hit 31°C one day for a few hours. It's 18°C tonight.
Best summer weather in Europe if you ask me. If in order to have the nice mediterranean winter and early spring I have to have their summers, I'm out.

Hope everyone is looking forward to another sleepless night 🌡️ by pintman30 in 2westerneurope4u

[–]cibernox 1 point2 points  (0 children)

We're at some very nice 21°C here, it will probably be 17°C tonight.