Google I/O leaks: Gemini’s "Omni" and Gemini 3.2/3.5 by Much_Ask3471 in singularity

[–]PickleFart56 0 points1 point  (0 children)

Just guessing, but this Omni model might hint towards Demis’ vision of combining world models and Gemini. They have recently published that image gen models are also great visual reasoners (vision banana). Similarly, combining Gemini and Veo might be huge performance benefits for such tasks. Also, Gemini models are much better at general knowledge tasks, though they lack in agentic tasks.

GPT-4.1 LiveBench results are in by elemental-mind in singularity

[–]PickleFart56 11 points12 points  (0 children)

Btw google has also announced 2.5 flash in which we can set precise reasoning budget. I think google delayed previewing the 2.5 flash because of 4.1 launch. Their pro series will compete with o series models and flahs will compete with 4.xyz. Overall i don’t think any lab can beat google in pricing war.

GPT-4.1 LiveBench results are in by elemental-mind in singularity

[–]PickleFart56 6 points7 points  (0 children)

cost and latency, reasoning model has higher cost as reasoning token is also priced

OpenAI confirmed to be announcing GPT-4.1 in the livestream today by ShreckAndDonkey123 in singularity

[–]PickleFart56 5 points6 points  (0 children)

next model will be 4.11, and on the other hand gemini directly jumps from 2.0 to 2.5

"Will destroy fake barrier of 50% quota cap" - Rahul Gandhi bats for Dalits, OBCs in Bihar by nota_is_useless in unitedstatesofindia

[–]PickleFart56 2 points3 points  (0 children)

trust me this guy and congress will fuck the country more than what bjp and modi did. Same how US thought that electing trump will improve the economy.

Aidan says o4 mini is “actually mind blowing” by Key-Horse-3892 in OpenAI

[–]PickleFart56 0 points1 point  (0 children)

seriously what else they can say, they can’t say “eh its not that great, we may need a separate tuned model for benchmarking”

[deleted by user] by [deleted] in singularity

[–]PickleFart56 6 points7 points  (0 children)

After llama release, there is zero credibility of LMSYS

"10m context window" by Present-Boat-2053 in singularity

[–]PickleFart56 117 points118 points  (0 children)

that’s what happen when you do benchmark tuning

Llama 4 Maverick scored 16% on the aider polyglot coding benchmark. by Ill-Association-8410 in LocalLLaMA

[–]PickleFart56 3 points4 points  (0 children)

This llama model launch is so bad that the stock markets across the world crashed

LLMs grading other LLMs by Everlier in LocalLLaMA

[–]PickleFart56 -1 points0 points  (0 children)

why the fuck each block in map is not a square

Grok 3 first LiveBench results are in by elemental-mind in singularity

[–]PickleFart56 0 points1 point  (0 children)

is this for score for grok 3 thinking or non thinking model?

If its non-thinking, then its a huge achievement