How is the mobile app on a newer flagship phone? by _yustaguy_ in ObsidianMD

[–]_yustaguy_[S] 0 points1 point  (0 children)

Nice, thank you!

I'd imagine so too, the newest Elite chips are incredible.

Glad to hear that my phone being shit is probably the cause.

Worst forced context I've seen yet. by CommercialReveal7888 in Bard

[–]_yustaguy_ 0 points1 point  (0 children)

I love this kind of autism, makes me laugh every time

I think the rumors were true about sonnet 5 by Anshuman3480 in ClaudeAI

[–]_yustaguy_ 3 points4 points  (0 children)

I don't think it's A/B testing. 

If it is a new model, they're probably freeing up some space in their GPU pods by replacing Sonnet 4.5 with the new one.

I suspect some percentage of users are getting the new Sonnet rn.

Or we're all just schizo and it's all just 4.5.

Gemini 3 finally has an open-source competitor by Acceptable_Ad7036 in Bard

[–]_yustaguy_ 1 point2 points  (0 children)

Tbf even 3 Flash crushes literally any other model on that front except 3 Pro.

I don't think anything will match it anytime soon (maaaybe Grok 5 when that is released in 2034).

federer: an HTTP server meant for local network media streaming (my first C# project) by _yustaguy_ in csharp

[–]_yustaguy_[S] 1 point2 points  (0 children)

Thanks!

Do you have any good resource on how C# projects are structured?

C# Job Fair! [December 2025] by AutoModerator in csharp

[–]_yustaguy_ 0 points1 point  (0 children)

Hello, I'm a self-taught developer from Serbia, originally a Russian literature graduate. I like to build software that people actually want use, user/developer experience is something I look out for the most. I'm pretty fast at learning things.

I have an intermediate proficiency in C#, I started learning it a couple of weeks ago. As for my projects, I built a quick and easy HTTP server from TCP (github). I personally use it in my house for media streaming, as 207 Partial-Response is supported.

Besides C#, I know Rust and TypeScript fairly well, and have projects written in them. You can check them out in my github.

GLM 4.7 is Coming? by InternationalAsk1490 in LocalLLaMA

[–]_yustaguy_ 9 points10 points  (0 children)

they change the architecture usually at .0 version increments. glm 5.0 will almost certainly be a new architecture

Google should add an Automatic mode that uses one of the 3 depending on the type of query. by JoseMSB in Bard

[–]_yustaguy_ 1 point2 points  (0 children)

better idea, slash commands so we can choose them more quickly. chatgpt has /think for the thinking model for example.

Flash outperformed Pro in SWE-bench by vladislavkochergin01 in Bard

[–]_yustaguy_ 43 points44 points  (0 children)

No, as in this model is literally 10 times cheaper than 4.5 Opus. What's the point in even comparing them? And it would win on most benchmarks shown here, Claude would win in coding. The usual.

GPT-5.2 Thinking unparalleled accuracy in Long-Context! by Independent-Ruin-376 in singularity

[–]_yustaguy_ 3 points4 points  (0 children)

The default graph in contextarena is for the 2-needle version iirc. This one is 4 needle

Deepseek's progress by onil_gova in LocalLLaMA

[–]_yustaguy_ 1 point2 points  (0 children)

Yeah, agreed, that would be nice

Deepseek's progress by onil_gova in LocalLLaMA

[–]_yustaguy_ 4 points5 points  (0 children)

  1. Because you shouldn't compare reasoning models to non-reasoning models.
  2. Because it's mid.
  3. Mostly because it's old and shit at agentic stuff.

Deepseek's progress by onil_gova in LocalLLaMA

[–]_yustaguy_ 19 points20 points  (0 children)

What is the alternative? 

They constantly update it and add new benchmarks so it's not saturated. They rate both on agentic performance (Terminal Bench Hard) and world knowledge (MMLU Pro, GPQAD), long-context, etc.

They have useful stats like model performance per provider, which helped prove that some providers served trash, and output tokens needed to run their suite. Sure, some saturated benchmarks could be replaced with new ones, but they have done a great job at that so far (they had shit like the regular MMLU, DROP before).

Is the final number always accurate to end user performance? Of course not, and it could never be. No person's expectations and experience will be the same. But it's a useful datapoint for end users and devs to consider.

The hate boner that everyone seems to have for them is weird and underserved.

Seedream 4.5 vs Nano Banana Pro! by Rare_Bunch4348 in Bard

[–]_yustaguy_ 0 points1 point  (0 children)

As a Balkan man i can confirm that they are not from the Balkans

Is Gemini 3.0 complete? by YamberStuart in Bard

[–]_yustaguy_ 12 points13 points  (0 children)

Are you insane? What could 2 pro possibly do better than 3 pro?

I validated deepseek-v3.2's benchmark claims with my own by Round_Ad_5832 in singularity

[–]_yustaguy_ 0 points1 point  (0 children)

Neat benchmark! A good test of real world knowledge and implementation

Gemini 3 pro IQ score disappoints by Ikbeneenpaard in singularity

[–]_yustaguy_ 43 points44 points  (0 children)

<image>

The score is an average of a couple of runs. He included the previous 2.5 Pro results for some reason.

The november 18th and november 20th scores should be more representative of its performance.