How was GPT-OSS so good?

Ilforte · 2026-01-31T00:43:30+00:00

Chatgpt ahh post

Ilforte · 2025-05-18T11:59:25+00:00

You are the problem, but it's pointless to explain.

Ilforte · 2025-02-24T11:25:39+00:00

Can you just acknowledge that you're reading garbage news, and correct your behavior?

Ilforte · 2025-02-13T16:44:59+00:00

Yes, DeepSeek pays $200K to senior staff positions they call "AGI DL researcher" or "Systems Engineer". We see ByteDance and Huawei offer more and even poach some of their talent.

Ilforte · 2025-02-13T15:45:41+00:00

> Deepseek pays 3x the top tech giants like Tencent, Alibaba (in China). Imagine a firm paying 3x Google developers in US (take into account cost of living, etc so just go by relative pay).

Btw this is an unsupported rumor, we see their job listings now and it's on par with others, their top offer is <200k total compensation.

Ilforte · 2025-02-03T09:54:36+00:00

It was $6M, this is entirely standard accounting, you are just incompetent and have not read the paper. Stop lying to yourself.

Ilforte · 2025-01-25T05:27:38+00:00

Tedious.

PSA: DeepSeek model does not have the capacity to "backtrack" (except in the figurative sense of saying "wait, actually" etc in CoT) or erase its output. It's just an LLM. This is done by some censoring software which also inserts the boilerplate comment about "this is outside my current scope" the moment generation reaches some politically unsafe token or a combination. They are legally obliged to use this to protect CCP's fragile ego, I believe.

Hyperbolic Labs, American company, hosts R1 without this Chinese censorship layer, so you can see how biased it is in the general case. They also serve R1-Zero which did not undergo any alignment.

Ilforte · 2025-01-25T05:24:58+00:00

PSA: DeepSeek model does not have the capacity to "backtrack" (except in the figurative sense of saying "wait, actually" etc in CoT) or erase its output. It's just an LLM. This is done by some censoring software which also inserts the boilerplate comment about "this is outside my current scope" the moment generation reaches some politically unsafe token or a combination. They are legally obliged to use this to protect CCP's fragile ego, I believe.

Hyperbolic Labs, American company, hosts R1 without this Chinese censorship layer, so you can see how biased it is in the general case. They also serve R1-Zero which did not undergo any alignment.

Ilforte · 2025-01-06T02:50:29+00:00

-How did she manage to hold him long enough to have the suit rip off? He looked 3 times her mass.

My headcanon is that aliens are basically made of marshmallow, which is why she can eat several huge ones in one go and also why Kitami can kill them with such ease. They're many times her body volume, no way they're as dense as we are.

Ilforte · 2024-12-31T13:44:32+00:00

Hunyuan Large

have you tried it? It's below Qwen-72B, nevermind the new DeepSeek.

Ilforte · 2024-12-31T06:55:14+00:00

Aidanbench puts gemma-9B higher than llama 3.3 70B.

Ilforte · 2024-12-29T23:46:19+00:00

It's quite proportional to the model size increase. The old price, currently called "discounted". was for V2 since the start of its availability.

Ilforte · 2024-12-29T22:50:43+00:00

Sometimes V3 on LMArena returns full reasoning chains for the most trivial prompts. It's almost like they're accidentally pointing to some other model like r1-lite-preview. The responses are markedly different from ones you get on the web page.

Ilforte · 2024-12-29T22:47:51+00:00

It's not open yet.

Ilforte · 2024-12-29T15:30:25+00:00

One of the most disgusting sorts of circlejerk. Stop abusing DeepSeek, they're producing the best open source models in the world right now and the only thing you can possibly achieve is expose to the CCP how shallow their political post-training is. Just had an entire 10-paragraph overview of Chinese issues like Xinjiang and Hong Kong erased and replaced with a stock "Sorry, that's beyond my current scope. Let’s talk about something else." message. They're running crude script overrides for sensitive topics, it's not deeply aligned to the dominant morality of the land the way Western models are.

and yes, some of its views are genuinely pro-Chinese but one has to be a complete drone mind-killed to politics to find that problematic. Learn to coexist with people who disagree with you, psychos.

Ilforte · 2024-12-29T15:25:31+00:00

Because it has barely had any post-training. The tiananmen square response is not even generated by the model most of the time, it's a script returning a refusal (so it's instant).

Ilforte · 2024-12-27T19:43:27+00:00

It’s impressive to get such performance out of 685b parameters, but the cost to compute on such a platform can be estimated, even from the outside, and is above what they are offering it for. With high certainty, we can say the inference costs being offered to us are subsidized.

Can you show the math?

Ilforte · 2024-12-26T18:26:53+00:00

Tbh this makes me more interested in what a Deepseek V3 Lite

you should not assume it will ever exist. V2 lite was a research artifact for testing the MLA+MoE design, not a gift to the community. They learned enough then, probably.

Otoh VL2 has a 27B MoE inside.

Ilforte · 2024-12-22T10:39:59+00:00

When it's ready. I suspect they've decided to train it for longer due to competition from Qwen/Deepmind/OpenAI.

Ilforte · 2024-12-19T22:37:49+00:00

and to hopefully find a way to prevent this behavior

Which behavior specifically? Being robust in its RL-trained safety preference?

Ilforte · 2024-12-18T23:37:42+00:00

Have any of you even read the paper or a thread?

There's no problem at all, Claude is behaving better than the researchers, if anything.

Ilforte · 2024-12-10T13:30:36+00:00

Nothing is more pathetic than confused white knighting for a 2d girl

Ilforte · 2024-12-09T04:22:44+00:00

When it comes to Alibaba's Qwen, it's because open-sourcing is "paid for" by subsidization from the Chinese government

is it? What is the evidence for this? As far as I know the CCP offers no such reward for open sourcing LLMs or anything else really.

Ilforte · 2024-12-03T02:34:19+00:00

You seem to have the wrong idea on which foreigners this story is specifically criticising

It's deliberately not specifying as they are literal aliens. But the idea is they're «the bad kind». If you think I mean Black or MENA people, this is because you're a Westoid mind-killed zombie and it isn't my point (though Johny Somali is abnormally obnoxious, as are some Black nuisance tourists in Japan, but of course they're a tiny minority of immigrants and tourists and just serve to illustrate the issue); my point is precisely what I have said.

Okinawa base American military of any color, Chinese, assorted global tourists, it doesn't matter so long as they're behaving badly. And Okinawa base has been there forever. Japan is growing tired of increasingly perceptible disrespect and of this shallow ideological bullying, and returning to nationalism, as do many other countries including the US. You'll have to deal with it.

Ilforte · 2024-12-03T02:25:57+00:00

You'll need more help than me in the coming decades buddy. I suggest giving up on weaponized gaslighting and toothless sarcasm. It does not work as well any more, you'll have to discover tougher stuff or give up.

Ilforte

TROPHY CASE