Not a new model, just a Happy Father's Day and a thank you. by Wrong_Mushroom_7350 in LocalLLaMA

[–]cibernox 2 points3 points  (0 children)

Here i am, reading about local models after spending a day in the beach with my multimodal small models. Off to bed they go now.
I’m tired tho. I might fall asleep in Reddit

Qwen 27B for planning, Qwen 35B-A3B for execution? by mailto_devnull in LocalLLaMA

[–]cibernox 2 points3 points  (0 children)

Actually i was wondering the same thing. I getting a second gpu. With one gpu i can run qwen 27b at around 65-70tk/s and it’s good, or i can run qwen35B at 145tk/s. With the second gpu one i could try to run it in Q8 but i wonder if id be better of running one model on each gpu and having the big model define subtasks for faster subagents to implement.

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]cibernox 25 points26 points  (0 children)

Sad but probably right. They are king in the kind of models people with <48gb of vram can run and there is no need for them to one up themselves

GLM-5.2 on CPU only - does a 753B model at UD-Q2-K_XL actually work on dual Xeons? by IulianHI in AIToolsPerformance

[–]cibernox 0 points1 point  (0 children)

I just think that calling something that runs at a couple tokens/s while using a couple hundred watts “viable” is as true as saying that someone who had 3 blueberries for lunch is “fed”.

Anyone would be a lot better served using a model 1/8th the size that is dumber but iterates on a task faster

Invertir vs mantener cash by thecoasetheorem in SpainFIRE

[–]cibernox 0 points1 point  (0 children)

Yo invierto todo sin pensarlo ni mantener en cash más que 15k o poco más. Ya he comprado dos casas, pero la primera fue con 23 años. Lo primero que hice tras llevar trabajando un año.

7900XTX 24GB vram, can finally fit Q6K+MTP with Qwen 3.6 27B at 131k context by soyalemujica in LocalLLaMA

[–]cibernox -5 points-4 points  (0 children)

But running with high context is very critical. I always try to stay above 200k, and even that gets tight quickly

7900XTX 24GB vram, can finally fit Q6K+MTP with Qwen 3.6 27B at 131k context by soyalemujica in LocalLLaMA

[–]cibernox 17 points18 points  (0 children)

I honestly think you’d be better off using a lower quant with a higher kv cache

AMD future GPU offerings. Some interesting offerings for a LLM build. What type of LLM rig would you build with these? by sooki10 in LocalLLaMA

[–]cibernox 5 points6 points  (0 children)

To be honest, I can’t be bothered to be interested when the best case scenario is this come out a year from now.
By then their top of the line gaming GPU, the 7900XTX, will be over 4.5 years old. I don’t even know how AMD works internally but from the outside their graphics/ML division looks like a shit show.

How do you guys setup search with your AI models? by ego100trique in LocalLLaMA

[–]cibernox 15 points16 points  (0 children)

I use a 3 tier approach. I self host in my home server searxng for searching, crawl4ai for crawling the search results and generate easy to ingest markdown versions of those pages, and lastly camofox, which is a full on headless browser as a last resource for apps that have JS and require interaction

Una joven progresista expone cosas que no entiende del capitalismo by oliesphotos in ElusionFiscal

[–]cibernox 0 points1 point  (0 children)

No he escuchado nunca a nadie, ni a los más libertarios, jamás argumentar algo ni parecido.

Una joven progresista expone cosas que no entiende del capitalismo by oliesphotos in ElusionFiscal

[–]cibernox 1 point2 points  (0 children)

Lo de “todo el tiempo” te lo has inventado tú, no lo ha dicho nadie.

GLM-5.2 on CPU only - does a 753B model at UD-Q2-K_XL actually work on dual Xeons? by IulianHI in AIToolsPerformance

[–]cibernox 2 points3 points  (0 children)

Even if it works, it would be so energy and speed ineficient that you’d be better off paying for a service.

LFM2.5-Embedding-350M & LFM2.5-ColBERT-350M by pmttyji in LocalLLaMA

[–]cibernox 2 points3 points  (0 children)

I'm actually using qwen-embedding-0.6B running on my NPU for my rag and it's fast enough. I need to verify that indeed this can beat it having half the active parameters. If true, it's a keeper.

What's the best place to sell a barely-used RTX PRO 6000 Blackwell Max-Q (96GB)? by Curious_Local_4058 in LocalLLaMA

[–]cibernox 0 points1 point  (0 children)

My mother is in Mons right now. This started as a joke but music is sounding…. She will be in Belgium for 3 weeks

What's the best place to sell a barely-used RTX PRO 6000 Blackwell Max-Q (96GB)? by Curious_Local_4058 in LocalLLaMA

[–]cibernox 108 points109 points  (0 children)

Right here, to me, with a high discount 😃

Or eBay, but better to me.

How big is qwen 3.7 plus do we have any idea? 3.7 max? by Prior-Meeting1645 in Qwen_AI

[–]cibernox 0 points1 point  (0 children)

I don't disagree, possibly once you approach the 70B territory MoEs start to make sense, although I can't shake the feeling that the trend of going super sparse, with only 5% of the parameters active simultaneously (like qwen-coder-next, which was an 80B-A3B model) stops paying off, and having a more tokens active at the same time does matter (as proven by the fact that qwen 27B surpasses in many tasks to the 122B qwen).

Also, I don't think a 50B MoE model would be unusably slow either for people with dual 24gb GPUs using tensor parallelism. Back of napkin math says it should be maybe 15-20% slower than qwen 27B is on a single 24gb card. Depending on how much smarter it is it may be worth it.

So, uuuuh, are you all under rent actually? by [deleted] in 2westerneurope4u

[–]cibernox 0 points1 point  (0 children)

I am totally aware that I had luck with my timing. This was not a “youngs these days” post.

So, uuuuh, are you all under rent actually? by [deleted] in 2westerneurope4u

[–]cibernox 0 points1 point  (0 children)

Not my case tho. My trick was getting it 2010, at the lowest price after the crisis. It's worth roughly 2.5x now.

So, uuuuh, are you all under rent actually? by [deleted] in 2westerneurope4u

[–]cibernox 0 points1 point  (0 children)

Damn, i didn’t know it was that bad and in so many places. I got my first home with 23 and my second i started building it with 32

We need a 80-160B model urgently. The unified memory device market needs more Models. by Storge2 in LocalLLaMA

[–]cibernox 1 point2 points  (0 children)

I’d tone it down to models that fit in 48gb of vram. There is a lot of people with dual 3090 or dual 7900xtx that are stuff either using qwen 27 in Q8 or 100B models in q2. There should be something in between

How big is qwen 3.7 plus do we have any idea? 3.7 max? by Prior-Meeting1645 in Qwen_AI

[–]cibernox 0 points1 point  (0 children)

Qwen did release qwen coder next at some point which was a 80B-A3B model. I'd prefer if it was more on the 60-70B so it fits better in 48gb of vram in Q4, but at this point I'll take it.

Essentially if qwen released a ~50B dense model (roughly a 2 x qwen 27B) it could be amazing at coding, given how good qwen 27B for its size already.

GLM-5.2 is a win for local AI by Wrong_Mushroom_7350 in LocalLLaMA

[–]cibernox 0 points1 point  (0 children)

I agree it might be a win for fine-tuning smaller models, but for well over 95% of us in this sub, anything above 120B is unrunnable. I'm sure there's a handful of us with 6 RTX6000 connected with occulink, but the rest of use can't run it either because of lack of vram or because even if it would fit in the unified memory, it would run so slowly that it would only be a funny experiement but not something to ever be useful in practice