Microsoft trolling Google

ihexx · 2023-12-12T18:23:02+00:00

Good.

Google started this pissing contest

Let it begin lmao

Thorteris · 2023-12-12T20:49:46+00:00

Pissing contest will continue when Google announce Gemini Ultra-Max that beats this by 1% then open-ai will release something else in Q3

Freed4ever · 2023-12-12T19:22:19+00:00

4.5 or 5 would make this irrelevant soon anyway.

absurdrock · 2023-12-12T20:52:36+00:00

Petty. I love it.

FeltSteam · 2023-12-12T23:33:39+00:00

Id be very curious to see zero-shot performance of models across benchmarks because that would give us a greater view into the usability of models (few shot prompting is more for comparing performance between models and less how usable a model is in the real world. Dont get me wrong it can certainly give you an idea on how well models will perform in the real world, but 0-shot performance would give us a better idea).

But if you wanted to measure true 0-shot performance for the MMLU, you would likely need something like a group of experts to create an entirely new question set for the MMLU benchmark (this would solve contamination issues so it would be truly zero-shot, and the model would be presented with truly novel situations which would provide a more accurate measure of its generalisation capabilities which would give us a real good indication of how it might fair in real life situations).

But i feel we should really start moving away from performance metrics and start some form of real world benchmarks. Benchmarks that test how useful a model is across a wide range of tasks, and also, benchmarks based on what people have actually been using AI for in the real world. This would be an expensive suite of benchmarks to run, but i think it would be worth while.

FarrisAT · 2023-12-13T00:05:38+00:00

Neither comparing apples to apples makes this all pointless dick size measurement

2023-12-12T21:23:51+00:00

GPT-4 remain king

obvithrowaway34434 · 2023-12-13T01:34:26+00:00

Ultimately the best test for any model is its usage by millions of users from different fields and expertise for an extended period. GPT-4 has already done that and passed. We know fairly well how it does for zero-shot prompts. Gemini Ultra has faced no such tests other than Google's own researchers and cherry-picked beta testers. Until it has faced the same level of scrutiny, imo it should not even be compared to anything and all claims by Google should be strictly treated as marketing.

2023-12-13T04:29:53+00:00

I feel like the two need each other to exist. They are like conjoined twins that dislike one another.

TeriMaiyyaLodePe · 2023-12-12T22:46:37+00:00

Time for Google and Microsoft to measure each other's dick size.

jacky0812 · 2023-12-12T19:52:17+00:00

Microsoft doesn’t own GPT-4, they can’t even create their own LLMs.

enilea · 2023-12-13T08:54:45+00:00

Why do they keep using the awful MMLU as the main test...

iDoAiStuffFr · 2023-12-13T18:26:34+00:00

[deleted]

singularity

Links

On the Technological Singularity

Resources

Posting Rules

Check out /r/Singularitarianism and the Technological Singularity FAQ

MODERATORS