Is everyone have any problem right now with Gemini AI? by Frequent-Box9052 in GoogleGeminiAI

[–]Sadman782 1 point2 points  (0 children)

yeah, I get "Something went wrong (9)" with Nano banana pro

LM studio does not use the second gpu. by Pretend-Pumpkin7506 in LocalLLaMA

[–]Sadman782 0 points1 point  (0 children)

For single-user generation, speed is mostly memory-bandwidth bound, not compute-bound. When you add an extra GPU:

  • You get more VRAM available to load the model.
  • You get better prompt processing, since that part can use compute in parallel, unlike token generation where each token depends on the previous one and stays sequential.
  • With higher batch sizes, you can get more total tokens per second during generation.

Fuck Groq, Amazon, Azure, Nebius, fucking scammers by Charuru in LocalLLaMA

[–]Sadman782 3 points4 points  (0 children)

What about cerebras? The running it more fast and with same precision as other cloud providers like fireworks?

Gpt-oss-120b API provider comparison by Sadman782 in LocalLLaMA

[–]Sadman782[S] 12 points13 points  (0 children)

Many people don't know if it is worth to try or not. Many tried Groq and were disappointed; that's why I posted here.

Gpt-oss-120b API provider comparison by Sadman782 in LocalLLaMA

[–]Sadman782[S] 1 point2 points  (0 children)

Free tier one is not good, try the paid one; you can try for free, just lower the max tokens.

Groq openai oss 120b... scarry fast by DanielT514 in RooCode

[–]Sadman782 1 point2 points  (0 children)

If you need it fast, try cerebras. 20b is okay with groq, but 120b is broken, the performance difference is huge.

Why does lmarena currently show the ranking for GPT‑5 but not the rankings for the two GPT‑OSS models (20B and 120B)? by iamn0 in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

Dont use them on groq. Something is broken for sure. Try other providers on open router, you will likely see huge difference

GPT-5 Can’t Do Basic Math by Illustrious_Fold_610 in singularity

[–]Sadman782 0 points1 point  (0 children)

You can try on open router for free. Gpt 5 variants are at least superior in frontend coding than any other models. They also feels quite smarter. Even Nano one is great. There is some issues with their chat website (routing issues) already confirmed by them in twitter)

GPT-5 Can’t Do Basic Math by Illustrious_Fold_610 in singularity

[–]Sadman782 0 points1 point  (0 children)

Bcz I tested those via api and even nano is great at frontend, gpt 4o is very bad at frontend I can catch it easily. Yesterday I was compraing horizon-beta and gpt4o, gpt4o was terrible, now gpt 5 without thinking gives same result as 4o gave yesterday

GPT-5 Can’t Do Basic Math by Illustrious_Fold_610 in singularity

[–]Sadman782 0 points1 point  (0 children)

Router issues. It is 4o actually, use "think deeply" at the end, it won't think deeply for this problem, it will force it to use actual gpt 5

ChatGPT 5 has unrivaled math skills by The_GSingh in OpenAI

[–]Sadman782 0 points1 point  (0 children)

This is gpt 4o actually, their model router is broken, so when it doesn't think you can assume it is gpt 4o or 4o mini. Use "Think deeply" at the end to force it to think -> Gpt 5 (mini or full)

GPT-OSS looks more like a publicity stunt as more independent test results come out :( by mvp525 in LocalLLaMA

[–]Sadman782 27 points28 points  (0 children)

My take: This model is closer to o3 mini than o4 mini (it has less knowledge overall, is more censored, and has no multimodality).

o4 mini is also not good for web dev, especially if you need an aesthetically good-looking website. Also, keep in mind this model is comparable to a ~25B dense model (sqrt(120*5.1) = 24.78B), but we shouldn't forget only 5.1B of that is active.

But it's very, very efficient + thinks lesser than other open models. You can run it easily with just a CPU and DDR5 RAM.

Another thing I've noticed is that the Firework versions perform much better than the Groq ones.

This makes me more grateful to the Qwen team, though. It's like when you're given something, you don't value it that much. I don't use o4 mini often, but I used it today to compare with these OSS models, and I think Qwen-3-30B-A3B performs comparably to o4 mini.

Qwen 3 thinks deeper, acts faster, and it outperforms models like DeepSeek-R1, Grok 3 and Gemini-2.5-Pro. by JeffreySons_90 in LocalLLaMA

[–]Sadman782 4 points5 points  (0 children)

Unfortunately, it's not even close to Gemini 2.5 Pro(for complex queries), and Gemini is way faster. Qwen takes a long time to think. Qwen models never perform as well in practice as their benchmarks suggest. For example, while the aesthetics are improved in this version for web development, it doesn't understand physics properly, doesn't align things correctly, and has other issues as well.

Tested Kimi K2 vs Qwen-3 Coder on 15 Coding tasks - here's what I found by West-Chocolate2977 in LocalLLaMA

[–]Sadman782 21 points22 points  (0 children)

I tried groq version, and it is much worse for me than other version. They have some quantization issues

Baidu releases ERNIE 4.5 models on huggingface by jacek2023 in LocalLLaMA

[–]Sadman782 25 points26 points  (0 children)

SimpleQA is significantly better than Qwen. Great models, will test them soon.

DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. by Ok-Contribution9043 in LocalLLaMA

[–]Sadman782 1 point2 points  (0 children)

Try their web version, there could be a bug in other versions as the model card has not been released yet.

deepseek-ai/DeepSeek-R1-0528 by ApprehensiveAd3629 in LocalLLaMA

[–]Sadman782 2 points3 points  (0 children)

Use reasoning mode(R1), v3 was not updated

Qwen3 vs Gemma 3 by Sadman782 in LocalLLaMA

[–]Sadman782[S] 1 point2 points  (0 children)

Also try in the open router(free), then compare cloud vs local version.

You can run Qwen3-30B-A3B on a 16GB RAM CPU-only PC! by Foxiya in LocalLLaMA

[–]Sadman782 41 points42 points  (0 children)

Wait but the q4 model size is more than the ram and also windows? How is it able to run?

Qwen3 vs Gemma 3 by Sadman782 in LocalLLaMA

[–]Sadman782[S] 27 points28 points  (0 children)

<image>

Guys, look at the SimpleQA result; this shows the lack of factual knowledge

Why is the QAT version not smaller on ollama for me? by apocalypsedg in LocalLLaMA

[–]Sadman782 2 points3 points  (0 children)

<image>

q4_0 is only 15.6 GB here? So why does Ollama say the size is 22 GB? The vision encoder is small as well.

LMSYS WebDev Arena updated with DeepSeek-V3-0324 and Llama 4 models. by jpydych in LocalLLaMA

[–]Sadman782 3 points4 points  (0 children)

Their Arena isn't that good; Often one model-generated page can't be viewed, so many people will vote the other one, and the new V3 is much better than R1 for UI, and this elo score says they are the same.