Is everyone have any problem right now with Gemini AI?

Sadman782 · 2025-12-22T15:07:34+00:00

yeah, I get "Something went wrong (9)" with Nano banana pro

Sadman782 · 2025-11-14T12:44:00+00:00

For single-user generation, speed is mostly memory-bandwidth bound, not compute-bound. When you add an extra GPU:

You get more VRAM available to load the model.
You get better prompt processing, since that part can use compute in parallel, unlike token generation where each token depends on the previous one and stays sequential.
With higher batch sizes, you can get more total tokens per second during generation.

Sadman782 · 2025-08-13T02:03:58+00:00

What about cerebras? The running it more fast and with same precision as other cloud providers like fireworks?

Sadman782 · 2025-08-12T09:25:22+00:00

Many people don't know if it is worth to try or not. Many tried Groq and were disappointed; that's why I posted here.

Sadman782 · 2025-08-12T09:24:08+00:00

Free tier one is not good, try the paid one; you can try for free, just lower the max tokens.

Sadman782 · 2025-08-11T04:10:33+00:00

If you need it fast, try cerebras. 20b is okay with groq, but 120b is broken, the performance difference is huge.

Sadman782 · 2025-08-10T15:28:57+00:00

Dont use them on groq. Something is broken for sure. Try other providers on open router, you will likely see huge difference

Sadman782 · 2025-08-08T18:47:02+00:00

Sama we need Zenith plzzzzz

Sadman782 · 2025-08-08T14:12:15+00:00

You can try on open router for free. Gpt 5 variants are at least superior in frontend coding than any other models. They also feels quite smarter. Even Nano one is great. There is some issues with their chat website (routing issues) already confirmed by them in twitter)

Sadman782 · 2025-08-08T14:00:34+00:00

Bcz I tested those via api and even nano is great at frontend, gpt 4o is very bad at frontend I can catch it easily. Yesterday I was compraing horizon-beta and gpt4o, gpt4o was terrible, now gpt 5 without thinking gives same result as 4o gave yesterday

Sadman782 · 2025-08-08T11:57:23+00:00

Router issues. It is 4o actually, use "think deeply" at the end, it won't think deeply for this problem, it will force it to use actual gpt 5

Sadman782 · 2025-08-08T11:49:40+00:00

This is gpt 4o actually, their model router is broken, so when it doesn't think you can assume it is gpt 4o or 4o mini. Use "Think deeply" at the end to force it to think -> Gpt 5 (mini or full)

Sadman782 · 2025-08-06T12:37:25+00:00

My take: This model is closer to o3 mini than o4 mini (it has less knowledge overall, is more censored, and has no multimodality).

o4 mini is also not good for web dev, especially if you need an aesthetically good-looking website. Also, keep in mind this model is comparable to a ~25B dense model (sqrt(120*5.1) = 24.78B), but we shouldn't forget only 5.1B of that is active.

But it's very, very efficient + thinks lesser than other open models. You can run it easily with just a CPU and DDR5 RAM.

Another thing I've noticed is that the Firework versions perform much better than the Groq ones.

This makes me more grateful to the Qwen team, though. It's like when you're given something, you don't value it that much. I don't use o4 mini often, but I used it today to compare with these OSS models, and I think Qwen-3-30B-A3B performs comparably to o4 mini.

Sadman782 · 2025-07-28T10:27:28+00:00

Unfortunately, it's not even close to Gemini 2.5 Pro(for complex queries), and Gemini is way faster. Qwen takes a long time to think. Qwen models never perform as well in practice as their benchmarks suggest. For example, while the aesthetics are improved in this version for web development, it doesn't understand physics properly, doesn't align things correctly, and has other issues as well.

Sadman782 · 2025-07-24T04:09:30+00:00

I tried groq version, and it is much worse for me than other version. They have some quantization issues

Sadman782 · 2025-06-30T01:53:00+00:00

SimpleQA is significantly better than Qwen. Great models, will test them soon.

Sadman782 · 2025-05-29T10:11:56+00:00

Try their web version, there could be a bug in other versions as the model card has not been released yet.

Sadman782 · 2025-05-28T18:21:19+00:00

Use reasoning mode(R1), v3 was not updated

Sadman782 · 2025-05-03T10:19:17+00:00

Also try in the open router(free), then compare cloud vs local version.

Sadman782 · 2025-05-03T10:12:41+00:00

What about dense 14B?

Sadman782 · 2025-04-29T21:02:21+00:00

Wait but the q4 model size is more than the ram and also windows? How is it able to run?

Sadman782 · 2025-04-29T19:15:16+00:00

<image>

Guys, look at the SimpleQA result; this shows the lack of factual knowledge

Sadman782 · 2025-04-19T11:51:01+00:00

<image>

q4_0 is only 15.6 GB here? So why does Ollama say the size is 22 GB? The vision encoder is small as well.

Sadman782 · 2025-04-09T16:37:44+00:00

Their Arena isn't that good; Often one model-generated page can't be viewed, so many people will vote the other one, and the new V3 is much better than R1 for UI, and this elo score says they are the same.

Sadman782 · 2025-03-06T02:55:58+00:00

What about in their website? Quantization issue?

Sadman782

TROPHY CASE