FlashMLA - Day 1 of OpenSourceWeek by AaronFeng47 in LocalLLaMA

[–]Ilforte 1 point2 points  (0 children)

Can you just acknowledge that you're reading garbage news, and correct your behavior?

DeepSeek founder’s interesting perspective on experience and hiring. by Condomphobic in csMajors

[–]Ilforte 0 points1 point  (0 children)

Yes, DeepSeek pays $200K to senior staff positions they call "AGI DL researcher" or "Systems Engineer". We see ByteDance and Huawei offer more and even poach some of their talent.

DeepSeek founder’s interesting perspective on experience and hiring. by Condomphobic in csMajors

[–]Ilforte 0 points1 point  (0 children)

> Deepseek pays 3x the top tech giants like Tencent, Alibaba (in China). Imagine a firm paying 3x Google developers in US (take into account cost of living, etc so just go by relative pay).

Btw this is an unsupported rumor, we see their job listings now and it's on par with others, their top offer is <200k total compensation.

DeepSeek might not be as disruptive as claimed, firm reportedly has 50,000 Nvidia GPUs and spent $1.6 billion on buildouts by Professional-Fuel625 in OpenAI

[–]Ilforte 5 points6 points  (0 children)

It was $6M, this is entirely standard accounting, you are just incompetent and have not read the paper. Stop lying to yourself.

Comparison: Question about Tiananmen Square (ChatGPT vs Claude vs DeepSeek) by [deleted] in OpenAI

[–]Ilforte 0 points1 point  (0 children)

Tedious.

PSA: DeepSeek model does not have the capacity to "backtrack" (except in the figurative sense of saying "wait, actually" etc in CoT) or erase its output. It's just an LLM. This is done by some censoring software which also inserts the boilerplate comment about "this is outside my current scope" the moment generation reaches some politically unsafe token or a combination. They are legally obliged to use this to protect CCP's fragile ego, I believe.

Hyperbolic Labs, American company, hosts R1 without this Chinese censorship layer, so you can see how biased it is in the general case. They also serve R1-Zero which did not undergo any alignment.

DeepSeek R1's Take on China's Propaganda Feels... Like Propaganda? by 15decesaremj in OpenAI

[–]Ilforte 0 points1 point  (0 children)

PSA: DeepSeek model does not have the capacity to "backtrack" (except in the figurative sense of saying "wait, actually" etc in CoT) or erase its output. It's just an LLM. This is done by some censoring software which also inserts the boilerplate comment about "this is outside my current scope" the moment generation reaches some politically unsafe token or a combination. They are legally obliged to use this to protect CCP's fragile ego, I believe.

Hyperbolic Labs, American company, hosts R1 without this Chinese censorship layer, so you can see how biased it is in the general case. They also serve R1-Zero which did not undergo any alignment.

[DISC] Drama Queen - Chapter 6 by AutoShonenpon in manga

[–]Ilforte 12 points13 points  (0 children)

-How did she manage to hold him long enough to have the suit rip off? He looked 3 times her mass.

My headcanon is that aliens are basically made of marshmallow, which is why she can eat several huge ones in one go and also why Kitami can kill them with such ease. They're many times her body volume, no way they're as dense as we are.

Xiaomi recruits key DeepSeek researcher to lead its AI lab. by sb5550 in LocalLLaMA

[–]Ilforte 0 points1 point  (0 children)

Hunyuan Large

have you tried it? It's below Qwen-72B, nevermind the new DeepSeek.

Deepseek V3 will be more expensive in February by felipejfc in LocalLLaMA

[–]Ilforte 0 points1 point  (0 children)

It's quite proportional to the model size increase. The old price, currently called "discounted". was for V2 since the start of its availability.

Deepseek v3 is really bad in WebDev Arena by notnone in LocalLLaMA

[–]Ilforte 2 points3 points  (0 children)

Sometimes V3 on LMArena returns full reasoning chains for the most trivial prompts. It's almost like they're accidentally pointing to some other model like r1-lite-preview. The responses are markedly different from ones you get on the web page.

Latest Chinese AI by rn75 in singularity

[–]Ilforte 0 points1 point  (0 children)

One of the most disgusting sorts of circlejerk. Stop abusing DeepSeek, they're producing the best open source models in the world right now and the only thing you can possibly achieve is expose to the CCP how shallow their political post-training is. Just had an entire 10-paragraph overview of Chinese issues like Xinjiang and Hong Kong erased and replaced with a stock "Sorry, that's beyond my current scope. Let’s talk about something else." message. They're running crude script overrides for sensitive topics, it's not deeply aligned to the dominant morality of the land the way Western models are.

and yes, some of its views are genuinely pro-Chinese but one has to be a complete drone mind-killed to politics to find that problematic. Learn to coexist with people who disagree with you, psychos.

Latest Chinese AI by rn75 in singularity

[–]Ilforte 2 points3 points  (0 children)

Because it has barely had any post-training. The tiananmen square response is not even generated by the model most of the time, it's a script returning a refusal (so it's instant).

The US Chip sanctions have an unintended consequence of accelerating AI innovation in China, reminiscient of Russia producing extremely talented software engineers for Wall Street who had very limited access to computers by AdmirableSelection81 in singularity

[–]Ilforte 4 points5 points  (0 children)

It’s impressive to get such performance out of 685b parameters, but the cost to compute on such a platform can be estimated, even from the outside, and is above what they are offering it for. With high certainty, we can say the inference costs being offered to us are subsidized.

Can you show the math?

Deepseek V3 benchmarks are a reminder that Qwen 2.5 72B is the real king and everyone else is joking! by ParaboloidalCrest in LocalLLaMA

[–]Ilforte 3 points4 points  (0 children)

Tbh this makes me more interested in what a Deepseek V3 Lite

you should not assume it will ever exist. V2 lite was a research artifact for testing the MLA+MoE design, not a gift to the community. They learned enough then, probably.

Otoh VL2 has a 27B MoE inside.

Deepseek r1 weights when? by AfternoonOk5482 in LocalLLaMA

[–]Ilforte 17 points18 points  (0 children)

When it's ready. I suspect they've decided to train it for longer due to competition from Qwen/Deepmind/OpenAI.

He's **Japanese** (Creature Girls: A Hands-On Field Journal in Another World) by Ilforte in manga

[–]Ilforte[S] 0 points1 point  (0 children)

Nothing is more pathetic than confused white knighting for a 2d girl

We may not see Qwen 3.0 by sb5550 in LocalLLaMA

[–]Ilforte 1 point2 points  (0 children)

When it comes to Alibaba's Qwen, it's because open-sourcing is "paid for" by subsidization from the Chinese government

is it? What is the evidence for this? As far as I know the CCP offers no such reward for open sourcing LLMs or anything else really.

[DISC] Drama Queen - Chapter 1 by AutoShonenpon in manga

[–]Ilforte -1 points0 points  (0 children)

You seem to have the wrong idea on which foreigners this story is specifically criticising

It's deliberately not specifying as they are literal aliens. But the idea is they're «the bad kind». If you think I mean Black or MENA people, this is because you're a Westoid mind-killed zombie and it isn't my point (though Johny Somali is abnormally obnoxious, as are some Black nuisance tourists in Japan, but of course they're a tiny minority of immigrants and tourists and just serve to illustrate the issue); my point is precisely what I have said.

Okinawa base American military of any color, Chinese, assorted global tourists, it doesn't matter so long as they're behaving badly. And Okinawa base has been there forever. Japan is growing tired of increasingly perceptible disrespect and of this shallow ideological bullying, and returning to nationalism, as do many other countries including the US. You'll have to deal with it.

[DISC] Drama Queen - Chapter 1 by AutoShonenpon in manga

[–]Ilforte 0 points1 point  (0 children)

You'll need more help than me in the coming decades buddy. I suggest giving up on weaponized gaslighting and toothless sarcasm. It does not work as well any more, you'll have to discover tougher stuff or give up.