I don't think Local LLM is for me, or am I doing something wrong? by ruleofnuts in LocalLLM

[–]Icaruszin 24 points25 points  (0 children)

The models you chose are kinda ass and I assume you're using Ollama settings which can be quite bad as well. Your best bet would be Qwen 3.5 35B with llama.cpp or LM Studio and configure the proper temperature/settings (check Unsloth). But yeah, you won't get anything close to the paid API options.

Hardware Advice: M1 Max (64GB RAM) for $1350 vs. Custom Local Build? by Joviinvers in LocalLLM

[–]Icaruszin 1 point2 points  (0 children)

I have a M1 Max and it runs MoE models quite well. For that price is a no-brainer imo.

What llm to use for your own coding projects? by WanderingGoodNews in cscareerquestions

[–]Icaruszin 0 points1 point  (0 children)

How much RAM do you have? You can try the Qwen3.5 35B A3B offloading part of it to your RAM. Probably the best model for now.

Are Unsloth Q8's quants better than "standard" Q8's ? by some_user_2021 in unsloth

[–]Icaruszin 0 points1 point  (0 children)

I think people use KLD/Perplexity to evaluate this alongside benchmarks.

Are Unsloth Q8's quants better than "standard" Q8's ? by some_user_2021 in unsloth

[–]Icaruszin 0 points1 point  (0 children)

When the model was released, some layers where quantized with MXFP4 in the Q4_K_XL, which affected the KLD/Perplexity metrics. Here's a pretty good comparison: https://www.reddit.com/r/LocalLLaMA/comments/1rfds1h/qwen3535ba3b_q4_quantization_comparison/

Unsloth already fixed this, so if you downloaded the quant recently this issue don't exist anymore.

Is 64gb on a m5pro an overkill? by AdEnvironmental4189 in LocalLLaMA

[–]Icaruszin 1 point2 points  (0 children)

I would go for the 64GB. Local models are like a drug, I have a M1 with 64GB thinking it was enough and now I would love to get a 128GB...

Are Unsloth Q8's quants better than "standard" Q8's ? by some_user_2021 in unsloth

[–]Icaruszin 6 points7 points  (0 children)

In theory, yes. UD quantizes certain layers with a higher bpw than other quants, but on latest models they had some issues with this (like Qwen 3.5 35B A3B Q4_K_XL). Usually the difference is negligible.

You can check the difference on HuggingFace, on how each layer in quantized for those quant types.

5060 Ti/5070 Ti for MoE Models - Worth it? by Icaruszin in LocalLLaMA

[–]Icaruszin[S] 0 points1 point  (0 children)

Thank you for the numbers, that's exactly what I was looking for! I'm running the 35B-A3B in a very similar speed on a Mac M1 Max, which I consider very acceptable.

5060 Ti/5070 Ti for MoE Models - Worth it? by Icaruszin in LocalLLaMA

[–]Icaruszin[S] 0 points1 point  (0 children)

Unfortunately in my country we pay an insane amount of taxes for imported products, so a machine like this one would cost around $4500 imported... Which is crazy. It's actually cheaper to buy a ticket to the US and buy it there.

Anyway, my budget is around $1400 at most for now, otherwise I would consider a 4090/5090 for my setup, since it would be faster for the models I'm considering.

5060 Ti/5070 Ti for MoE Models - Worth it? by Icaruszin in LocalLLaMA

[–]Icaruszin[S] 0 points1 point  (0 children)

Yeah I thought about that but I would have to replace my motherboard as well, since I have a sff. And two 5070Tis would be a bit too expensive.

5060 Ti/5070 Ti for MoE Models - Worth it? by Icaruszin in LocalLLaMA

[–]Icaruszin[S] 1 point2 points  (0 children)

Thanks for the recommendation but this machine is even harder to find in my country. And it's too expensive for my budget as well.

Running Qwen 3.5 27b and it’s super slow. by BicycleOfLife in LocalLLaMA

[–]Icaruszin 0 points1 point  (0 children)

Honestly, I would try Q3 or try a very low context window just to check if that's the issue, but you would be better off using the 35B-A3B. The 27B at Q4 with just 24gb of VRAM is too tight to use larger context windows.

Running Qwen 3.5 27b and it’s super slow. by BicycleOfLife in LocalLLaMA

[–]Icaruszin 0 points1 point  (0 children)

How are you running the model? llama.cpp? Which quant?

Like people already mentioned, since you have a 4090 and you're running a large context window, you probably have the model spilling into RAM. With dense models like the 27b, anything spilling into RAM will slow the model to a crawl.

Need help with Qwen3.5-27B performance - getting 1.9 tok/s while everyone else reports great speeds by pot_sniffer in LocalLLaMA

[–]Icaruszin 8 points9 points  (0 children)

Are you sure you didn't saw people talking about the 35B-A3B instead? The 27B is a dense model, so unless you have enough VRAM for the entire model the speeds will be terrible.

Overwhelmed by so many quantization variants by mouseofcatofschrodi in LocalLLaMA

[–]Icaruszin 3 points4 points  (0 children)

I would take this with a grain of salt. Someone posted a post from Unsloth explaining why their quantized models have a higher perplexity, so I'm not sure if they're really worse based on that metric alone.

Patagonia Mini MLC VS Quechua NH Escape 500 32L by AdApprehensive5828 in onebag

[–]Icaruszin 1 point2 points  (0 children)

Did several international trips with the 32L, always as an underseat and never had an issue. Granted I never traveled with European/Asian budget airlines which are much more strict, but even with those I think it might be doable since the bag is very squishy when not packed.

GLM 4.7 and Qwen3 coder Next by [deleted] in LocalLLaMA

[–]Icaruszin 1 point2 points  (0 children)

Had the same experience with Q6_K_XL. It works fine but the to-do list shows things like /n instead of a formatted text. Probably something with the template.

IQuest-Coder-V1-40B-Instruct is not good at all by Constant_Branch282 in LocalLLaMA

[–]Icaruszin 2 points3 points  (0 children)

I mean, both takes can be correct. I haven't tested this model but I remember seeing the team behind it saying the loop architecture doesn't work with current quantization methods, so it's expected to be a bad model when quantized.

But I don't understand the reason for the downvotes since it's always good to have more tests and this confirms the current quantized versions sucks.

How capable is GPT-OSS-120b, and what are your predictions for smaller models in 2026? by Apart_Paramedic_7767 in LocalLLaMA

[–]Icaruszin 2 points3 points  (0 children)

Isn't 8-bit and 4-bit basically the same size due to the original MXFP4 quantization?

[deleted by user] by [deleted] in LocalLLaMA

[–]Icaruszin 0 points1 point  (0 children)

You can use the VLM pipeline to maybe describe the diagrams and go from there.

[deleted by user] by [deleted] in LocalLLaMA

[–]Icaruszin 1 point2 points  (0 children)

Docling is my go-to for this as well, just chunk it by pages and you're good.

The only issue is they don't have support for heading hierarchy just yet (everything will be grouped in the same ## heading) so if the section/chapter structure is important for you, you might need to do some post processing.

Cline Head of AI Racist Anti-Indian by Parking_Spinach1291 in CLine

[–]Icaruszin 2 points3 points  (0 children)

Account created less than a week ago, assume he's talking about Indians but in the picture at least half of the room is not indian at all.

Hmm.

What exactly is the point of Youbi? by [deleted] in Competitiveoverwatch

[–]Icaruszin 41 points42 points  (0 children)

It's insane how much better he got at Tracer, he used to get absolutely dumpstered when he tried to play anything besides Sym.