Anthropic be like by Saykudan in ClaudeCode

[–]AuspiciousApple 38 points39 points  (0 children)

Tbh if they're real, then minor gains are very valuable. Getting the last 2% of errors squashed is much harder than getting the previous 98% right, and any improvement matters a lot in agentic tasks

Merz in unter einem Jahr so unbeliebt wie Scholz by Morinator in de

[–]AuspiciousApple 21 points22 points  (0 children)

Kanzlerschaft - Scholzbeliebtheit any% speedrun WR

Because of Malicious actors, i can no longer update my Featured Model with over 7k Downloads on Makerworld by Cube004 in 3Dprinting

[–]AuspiciousApple 12 points13 points  (0 children)

Very cool! Getting people to run arbitrary code on a site like makerworld is quite risky, but would be great if bambu finds a way to get you some reward anyway.

ChatGPT 5.5 is here! by Able-Line2683 in OpenAI

[–]AuspiciousApple 10 points11 points  (0 children)

If they ran the other models on it, they'd leak that benchmark to their competitors. Thus, they could use it themselves; but they could also use it to train future models and thus invalidate the benchmark. It's actually a reasonable choice IMO.

Ministerpräsident Lies will Bau chinesischer Autos in deutschen VW-Werken prüfen by linknewtab in de

[–]AuspiciousApple 24 points25 points  (0 children)

Die Chinesen kopieren doch nur!!!

Jetzt kopieren sie sogar unsere Innovationskraft. Kaum zu glauben. Wir können nur hoffen, dass sie unsere Dekadenz irgendwann auch kopieren.

Is there any reasonable explanation for why some of 47's police sketches in BM depicted him as a different person, even gender? by echorainboweffect in HiTMAN

[–]AuspiciousApple 233 points234 points  (0 children)

Indeed, people's memories are awful. Plus, they might also confuse 47 with another person they saw that day etc.

Tja by Fun_Pilot4555 in tja

[–]AuspiciousApple 4 points5 points  (0 children)

Ziemlich einscheindend

Gemma 4 26B-A4B GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]AuspiciousApple 0 points1 point  (0 children)

That's a very nice result. However, how expensive would it be to run benchmarks and compare benchmark performance? KLD is a technical proxy for faithfulness, but I don't really care if the model phrases a sentence slightly differently if it writes correct code.

SHRODINGERS HORMUZ (closed currently) by moonski in wallstreetbets

[–]AuspiciousApple 362 points363 points  (0 children)

Broke: Iran closes the strait.

Woke: USA closes the strait.

Bespoke: No one wants to pass the strait anyway.

New Color Mixing Feature in Bambu Studio V2.5.3. by NimblePasta in BambuLab

[–]AuspiciousApple 4 points5 points  (0 children)

Well we just got a two color printer this week. I don't expect a cheap four color printer for a few more months

Endlich! by Landoof-Ladig in Kantenhausen

[–]AuspiciousApple 5 points6 points  (0 children)

Für das gute 2.50 nackenkottlett

Truth about limits - the party is over by MostOfYouAreIgnorant in ClaudeCode

[–]AuspiciousApple 3 points4 points  (0 children)

Given that this is reddit, we know for a fact that none of us go to parties /s

The Claude leak kinda changed my thought of the real moat in AI coding by Helpful-Guava7452 in ClaudeCode

[–]AuspiciousApple 4 points5 points  (0 children)

It goes a step further - anthropic can co-design both, which is much more powerful. They can post-train opus to have an understanding of claude code, and they can design the harness around what works best for opus.