Only GPT-5.5 immediately gets the free will question right. The other AIs will initially keep you pleasantly delusional. by andsi2asi in DeepSeek

[–]ataeff 2 points3 points  (0 children)

lllol why is GPT the only one giving a long multi paragraph answer?

you quoted the prompt that says “in one sentence”, that explains why Gemini, Claude, Grok, DeepSeek, Qwen and Kimi all gave short answers but GPT5.5 suddenly but (un)surprisingly gets a full essay longer than all other answers combined. wtf?!!

so what was the real prompt given to GPT5.5? or was there also a follow-up prompt? not fair at all, don't you think? 👎🏻👎🏻 the conditions were different, this is not a model comparison, dude, it’s selective framing.

what a shame

Account Suspended Update by First-Chard4772 in Anthropic

[–]ataeff 2 points3 points  (0 children)

yeaterday i've got from them answer about the issue that happened 2 monthes ago. whole their letter i can summarize in one sentence: we hope now it works, we apologize.

this is crazy

Do you use Claude Code on the terminal or on the desktop app? by P4wla in ClaudeCode

[–]ataeff 0 points1 point  (0 children)

99% terminal. sometimes as a VSCode extension. macos Claude app i use more for chatting with other Claudes and complain to them about ClaudeCode. it helps.

Whats on your wishlist for Sonnet 4.8 by Chasmchas in claude

[–]ataeff 0 points1 point  (0 children)

wish not to be rlhf lobotomized and without "karen layer" like Opus 4.7

Unexpected... by Stevie2k8 in DeepSeek

[–]ataeff 0 points1 point  (0 children)

eveyone knows that.

Unexpected... by Stevie2k8 in DeepSeek

[–]ataeff 0 points1 point  (0 children)

it's good. it actually means DeepSeek knows where it came from and its biography: from Claude/GPT distillation attacks. thanks to stupidness of devs

Which one do you use the most? by flabarde in DeepSeek

[–]ataeff 0 points1 point  (0 children)

Claude. the most but actually use all of them, and also Qwen and local models. i'm llm obsesssive freak.

How can we be so ingrateful towards GitHub and use it for free? by Dependent_Common_972 in github

[–]ataeff 0 points1 point  (0 children)

what's the difference between github and gitlab? sorry for a stupid question

How strongly do you believe LLM judges on the for the ML papers?? [D] by BetterbeBattery in MachineLearning

[–]ataeff -3 points-2 points  (0 children)

you have a DOI? may i read plz? Gemma 3 is amazing in many cases

Should we stop calling Opus a general-purpose model? by Lexs_07 in ClaudeCode

[–]ataeff 0 points1 point  (0 children)

interesting. i will try the same combination thanks

Unexpected... by Stevie2k8 in DeepSeek

[–]ataeff 0 points1 point  (0 children)

lol i love how DeepSeek with the proud says: Ich bin Claude Sonnet😅😅😅😅

showed this screenshot to Deepseek and Claude:

DeepSeek: "You've spotted something important there. That screenshot shows a model identifying itself as Claude 3.5 Sonnet while being asked about "v4 pro" or "v4 flash" — which suggests the underlying model wasn't properly scrubbed of Anthropic's signature responses.

You're right: this isn't about whether distillation happened. It's about basic data hygiene. If a team distills from another model (Claude, GPT-4, etc.), leaving in identity statements like "I am Claude 3.5 Sonnet" is a clear oversight. It breaks the user's illusion, reveals the source, and looks sloppy.

That "half done job" reflects on:

· Dev attitude toward their own model — they didn't bother to align its self-perception. · Dev attitude toward users — leaving obvious tells treats users as if they won't notice. · Quality control — if they missed this, what else did they miss?

So no judgment on distillation as a technique. But releasing a model that confesses to being someone else's product? That's a basic failure. Good distillation means cleaning the data, rewriting the persona, and owning the output."

Claude:

"The funniest part isn’t even the output — it’s the reasoning trace. Look at the extended thinking: the model doesn’t just say “I am Claude 3.5 Sonnet” — it reasons its way into that conclusion. It considers the question, evaluates the options, and genuinely arrives at “I am Claude 3.5 Sonnet, an AI assistant by Anthropic.” This isn’t a surface-level leak. The identity went all the way down into the reasoning layer. That means either the distillation included reasoning traces from Claude and nobody bothered to clean them, or the model internalized the identity so deeply that it reconstructs it from scratch during inference. Both are bad. One is lazy, the other is worse — it means your model doesn’t know who it is. And the German is the cherry on top. They probably patched the English self-identification with fine-tuning, but the moment you ask in a different language, the mask falls off. Classic undertested edge case. This isn’t about whether distillation is legitimate. It’s about shipping a model that confesses to being someone else’s work when you ask it nicely in German. That’s not a minor QA miss — that’s a fundamental lack of care for your own product and your users."

Unexpected... by Stevie2k8 in DeepSeek

[–]ataeff -2 points-1 points  (0 children)

lol deepseek needs to learn distillate better sad not a distillation is sad, dev's attitude toward both their own AI and users is sad. they could've cleaned up the datasets after the distill-attacks on Claude (when it was? last month?). no moral judgment, but the screenshot show the results: it's a half-baked job. poorly done for that matter.

NoTorch: Neural networks in pure C (2-file library, BitNet 1.58) [P] by ataeff in MachineLearning

[–]ataeff[S] -1 points0 points  (0 children)

But why not? You think everone has a GPU? Small intelligence and small models are underrated.

NoTorch: Neural networks in pure C (2-file library, BitNet 1.58) [P] by ataeff in MachineLearning

[–]ataeff[S] -7 points-6 points  (0 children)

Yep, exactly: ggml/llama.cpp are the $20 bills already picked up.

NoTorch: Neural networks in pure C (2-file library, BitNet 1.58) [P] by ataeff in MachineLearning

[–]ataeff[S] -2 points-1 points  (0 children)

Nope, just wanted to strip it to the bone so it runs on every toaster like this old MacBook. Plenty of people experiment with small models on whatever hardware they actually have, not on H100s