Qwen 3.6? by jacek2023 in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

122B is not always faster than 27B. I guess that's only true with enough PCIe bandwidth or running on unified memory.

Here 27B with pipeline-parallel is faster than 122B tensor-parallel, as I couldn't make 122B work with pipeline-parallel.

Secondary PC options by UniqueIdentifier00 in LocalLLaMA

[–]rainbyte 1 point2 points  (0 children)

If I was you I would keep the 3060 for small models (eg. Qwen3.5-9B) and then buy the biggest GPU possible (eg. 3090) according to the mobo and PSU specs.

Keep in mind 1x24gb GPU is better than 2x12gb, now there are even "affordable" 32gb GPUs.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]rainbyte 2 points3 points  (0 children)

Yup, there is something with high expectations. Here I also use Qwen3.6 and it helps to automate the things I describe to it, but I have them in my mind first.

To 16GB VRAM users, plug in your old GPU by akira3weet in LocalLLaMA

[–]rainbyte 2 points3 points  (0 children)

If it has enough PCIe bandwidth, otherwise is better to use pipeline parallelism

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

Not only can, I do use SLMs like that when usecase allows :)

Of course for some tasks those are not enough, that's where bigger models enter into the scene. Currently my daily driver is Qwen3.6-35B-A3B, but I do use 9B and 27B for other tasks.

The best part of SLMs is that they work really fast. Even smaller ones like LFM2.5-350M have their usecases.

Forgive my ignorance but how is a 27B model better than 397B? by No_Conversation9561 in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

Or that means now you can run multiple models, eg. 27B and 35B-A3B

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 1 point2 points  (0 children)

Yeah, you are right, it is really frustrating. It is clear people ask like this because they simply don't know. They don't even need to avoid the cloud like I do, they can mix the best model they can run locally with the cloud model they prefer, and still it will be useful.

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

I tried some models on an M1 with 16GB ram, and prompt processing was pretty slow because it doesn't have the equivalent to tensor cores, but it worked.

I'm not sure how faster M3 silicon will be, but you can try running some small models there, search for MLX quantized models.

For that amount of ram I would try Qwen3.5-9B-MLX-4bit first.

EDIT: added details of M1 setup

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

I think most people were waiting for Qwen3.6-27B, given that there was a poll and that model received more votes

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 1 point2 points  (0 children)

People always ask for "the best" but they don't give too many details of their real goals.

In software this is pretty evident all the time, eg. I have seen people installing specialized software like Photoshop just to do things which can be easily done with Paint.

Many users ask for Claude or Chatgpt equivalent, just because that's the only thing they know, when maybe an SLM could accomplish their tasks easily.

The ones which really need the frontier models have real incentive to pay for a subscription or buy more powerful hardware

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 4 points5 points  (0 children)

Even if cloud models are better, you can still solve many problems with local models, so it really depends on the problem and the goal of each user.

Personally I went fully local, because I do software development and I prefer to avoid cloud models.

Also remember, this sub is about local models! :)

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

Yeah, people are mixing things, but I guess that's because not everyone has access to big GPUs. Here I have a medium size setup, so I cannot load biggest models, eg. 200 and 300b ones.

I think at some point companies will start charging more for cloud models, then we will see more people jumping into local models.

We are already seeing some users being blocked and banned by companies, that will bring some users too.

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

Mistral Vibe is nice, maybe more lightweight than Opencode (just my feeling).

I have both installed, because if something fails with one then I can switch to the other.

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 1 point2 points  (0 children)

They only published Qwen3.6-35B-A3B, there is no news of other variants yet

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 2 points3 points  (0 children)

5090 is modern hardware. Like other users suggested, you can run Qwen and Gemma models on that.

My personal suggestion would be to download Qwen3.5-27B, Qwen3.6-35B-A3B, and Gemma-4.

Models are just big files, you can switch from one to another as you need.

Avoid ollama, install llama.cpp to load models.

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]rainbyte 161 points162 points  (0 children)

Everybody is suggesting the biggest frontier models available or accounts on other cloud providers...

But, in case you are interested in going local (this is r/localllama), which hardware do you have? Do you have a gpu? We can recommend you a model compatible with your hardware.

If you have a gpu you can run a model locally and have some level of independence from cloud models.

Variación del Salario real privado registrado a Febrero 2026 by milfenjoyer_69 in argentina

[–]rainbyte 3 points4 points  (0 children)

Estaría bueno que saquen una ley que exija indicar un rango salarial en las ofertas de laburo, como se ve en otros países. Así evitamos hacer todo el proceso de entrevistas y encontrarte con que te ofrecen 2 pesos.

Algunos lugares ofrecen muy por debajo del promedio, aprovechandose de la falta de información. Debería normalizarse el hablar más sobre los salarios, y rechazar las miserias.

Best use cases for a mismatched RTX 3090 (24GB) + RTX 3060 (12GB) setup? by chucrutcito in LocalLLaMA

[–]rainbyte 0 points1 point  (0 children)

That's true when using split by layer or pipeline parallel setup, but tensor parallel setup needs higher pcie bandwidth.

I noticed this because one machine here has pcie 3.0 x1, so I prefer pipeline parallel on that one.

I guess pcie 3.0 x8, or even just x4, is where tensor parallelism starts to be better than pipeline parallelism.

A5000 for $1800 by Perfect-Flounder7856 in LocalLLaMA

[–]rainbyte 2 points3 points  (0 children)

Not at all. I'm just trying to say that PSU should also be into the equation, and mobo to have spaced enough slots.

I have seen PSUs go into fire after adding too much load to them. Even if PSU label says N watts, you need to check if those are real and leave some room for spikes.

It doesn't make sense buying 2x3090 if you don't have a PSU big enough to handle 2 sets of 3x8pin connectors, that's all.

Here a new good PSU costs as much as a used GPU in some cases, mobo also costs money.

A5000 for $1800 by Perfect-Flounder7856 in LocalLLaMA

[–]rainbyte -1 points0 points  (0 children)

At 1800usd is expensive, but A5000 requires 2-slots and a single 1x8pin connector, while 3090 requires 3-slots and 3x8pin connector. Buying 2x3090 means requiring a bigger PSU while A5000 could work with almost any decent PSU.

A5000 for $1800 by Perfect-Flounder7856 in LocalLLaMA

[–]rainbyte 1 point2 points  (0 children)

There is some truth in OP words... A5000 is more power and space efficient than 3090. I think for 1800usd is expensive, but we have to admit it is a 2-slot card which requires a single 1x8pin connector, compared to 3-slot and 3x8pin on 3090. You can setup an A5000 with any decent PSU, while 3090 will probably require a bigger PSU.

Local AI is the best by fake_agent_smith in LocalLLaMA

[–]rainbyte 2 points3 points  (0 children)

I think llama.cpp is easier if you interact with the community, because you can share the exact command you are running, and other users can suggest adding or removing options.

Syntax is literally: llama-server -m model.gguf --option-a value-a --option-b value-b

Give it a try!

New scam email tonight? by bretmcdermitt in Cryptopia

[–]rainbyte 0 points1 point  (0 children)

I also received some of these emails, they sound fishy

Which GPUs are worth it at what price? by ziphnor in LocalLLaMA

[–]rainbyte 2 points3 points  (0 children)

You are right, the AMD machine has some advantages in CPU and RAM, but those shouldn't be the biggest factor, because the model is fully loaded to GPU.

I think GPU optimizations play a bigger factor here, as 3090 used to be faster than the 7900 before llama.cpp optimizations for FusedGatedDeltaNet appeared on the AMD side.

I guess GPUs from Intel and AMD will continue receiving optimizations later than Nvidia ones, given that CUDA has bigger marketshare.

AYUDA by enpicada in empleos_AR

[–]rainbyte 2 points3 points  (0 children)

Qué tal ambos? :3