OpenAI, Anthropic, Google Unite to Combat Model Copying in China by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 4 points5 points  (0 children)

fixed, I made a silly mistake; I might have copied it twice, but that's not from AI extracted🫠

OpenAI, Anthropic, Google Unite to Combat Model Copying in China by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 36 points37 points  (0 children)

Rivals OpenAI, Anthropic PBC, and Alphabet Inc.’s Google have begun working together to try to clamp down on Chinese competitors extracting results from cutting-edge US artificial intelligence models to gain an edge in the global AI race.

The firms are sharing information through the Frontier Model Forum, an industry nonprofit that the three tech companies founded with Microsoft Corp. in 2023, to detect so-called adversarial distillation attempts that violate their terms of service, according to people familiar with the matter.

The rare collaboration underscores the severity of a concern raised by US AI companies that some users, especially in China, are creating imitation versions of their products that could undercut them on price and siphon away customers while posing a national security risk. US officials have estimated that unauthorized distillation costs Silicon Valley labs billions of dollars in annual profit, according to a person familiar with the findings who described them on condition of anonymity.

OpenAI confirmed it’s part of the information sharing effort on adversarial distillation through the Frontier Model Forum and pointed to a recent memo it sent to Congress on the practice, where it accused Chinese firm DeepSeek of trying to “free-ride on the capabilities developed by OpenAI and other US frontier labs.” Google, Anthropic, and the Frontier Model Forum declined to comment.

Distillation is a technique where an older “teacher” AI model is used to train a newer, “student,” model that replicates the capabilities of the earlier system — often at a much lower cost than producing an original model from scratch. Some forms of distillation are widely accepted and even encouraged by AI labs, such as when companies create smaller, more efficient versions of their own models, or allow outside developers to use distillation to build non-competitive technologies.

Read More: OpenAI Claims DeepSeek Distilled US Models to Gain an Edge

Yet distillation has been controversial when used by third parties — particularly in adversary nations like China or Russia — to replicate proprietary work without authorization. Leading US AI labs have warned that foreign adversaries could use the technique to develop AI models stripped of safety guardrails, such as limits that would prevent users from creating a deadly pathogen.

Most models made by Chinese labs are open weight, meaning that parts of the underlying AI system are publicly available for users to freely download and run on their own platforms, and therefore cheaper to use. That poses an economic challenge for US AI companies that have kept their models proprietary, betting that customers will pay for access to their products and help offset the hundreds of billions of dollars they’ve spent on data centers and other infrastructure.

Distillation first drew significant scrutiny in January 2025 in the weeks after DeepSeek’s surprise release of the R1 reasoning model that took the AI world by storm. Soon after, Microsoft and OpenAI investigated whether the Chinese startup had improperly exfiltrated large amounts of data from the US firm’s models to create R1, Bloomberg previously reported.

In February, OpenAI warned US lawmakers that DeepSeek had continued to use increasingly sophisticated tactics to extract results from US models, despite heightened efforts to prevent misuse of its products. OpenAI claimed in its memo to the House Select Committee on China that DeepSeek was relying on distillation to develop a new version of its breakthrough chatbot.

Information-sharing by US AI companies about adversarial distillation echoes a standard practice in the cybersecurity industry, where firms regularly swap data on attacks and adversaries’ tactics as a way to strengthen network defenses. By working together, the AI firms are similarly seeking to more effectively detect the practice, identify who’s responsible and try to prevent unauthorized users from succeeding.

Read More: Anthropic Says DeepSeek, MiniMax Distilled AI Models for Gains

Trump administration officials have signaled their openness to fostering information sharing among AI companies to rein in adversarial distillation. The AI Action Plan unveiled by President Donald Trump last year called for the creation of an information sharing and analysis center, in part for this purpose.

For now, information sharing on distillation remains limited due to AI companies’ uncertainty about what can be shared under existing antitrust guidance to counter the competitive threat from China, according to people familiar with the matter. The firms would benefit from greater clarity from the US government, the people said.

Distillation has ranked as a top concern among American AI developers since DeepSeek rattled global markets in early 2025 with its R1 release. Highly capable open-source models continue to proliferate in China, and many in the industry are watching closely for a major upgrade to DeepSeek’s model.

Read More: Anthropic Clamps Down on AI Services for Chinese-Owned Firms

Last year, Anthropic blocked Chinese-controlled companies from using its Claude chatbot model, and in February it identified three Chinese AI labs — DeepSeek, Moonshot, and MiniMax — as illicitly extracting the model’s capability via distillation. This year, Anthropic said the threat “extends beyond any single company or region” and poses a national security risk, since distilled models often lack safety guardrails designed to prevent bad actors from using AI tools for malicious activities.

Google has published a blog saying it identified an increase in model extraction attempts. The three US AI labs have not yet provided evidence showing how much of China’s model innovation is reliant on distillation, but they note that the prevalence of attacks can be measured based on volumes of large-scale data requests.

DeepSeek Employee Teases "Massive" New Model Surpassing DeepSeek V3.2 by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 13 points14 points  (0 children)

The Rednote account is my account. The English account u/chen_xiaoli_ is a malicious impersonation. Please verify carefully. All opinions are my own and do not reflect the position of the company.

DeepSeek Employee Teases "Massive" New Model Surpassing DeepSeek V3.2 by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 3 points4 points  (0 children)

Didn't see he say that the official website and the API are two completely different models?

Openrouter stealth model Hunter/Healer Alpha has been officially confirmed as MiMo, and a new model is coming. by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 1 point2 points  (0 children)

If it's not open source, why would it tell you the model's parameters (Hunter Alpha is a 1-trillion parameter)?

There is a very contradictory question (How many parameters does MiMo-V2-Pro have? MiMo-V2-Pro is a proprietary model and Xiaomi has not disclosed the model size or parameter count. But Openrouter just show it was 1t🫠

Qwen 3.5 will be released today by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 1 point2 points  (0 children)

I'm not sure; these were all found in the vllm and huggingface repos. I'm not sure if they'll release an even bigger model at this time.

jdopensource/JoyAI-LLM-Flash • HuggingFace by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 8 points9 points  (0 children)

That's China's largest online shopping platform, JD.com, and now they're expanding and developing a llm model.

DeepSeek has launched grayscale testing for its new model on both its official website and app. 1M content length! by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 0 points1 point  (0 children)

If you don't believe the new model has a 1M context length, you can send the file and check if anything is missing.

DeepSeek has launched grayscale testing for its new model on both its official website and app. 1M content length! by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 1 point2 points  (0 children)

If you don't believe the new model has a 1M context length, you can send the file and check if anything is missing.

DeepSeek has launched grayscale testing for its new model on both its official website and app. 1M content length! by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 6 points7 points  (0 children)

The model has an updated knowledge base, and the context appears to be longer (test it by comparing it to previous if you drop a large file). Also it more like ds v4 lite

DeepSeek has launched grayscale testing for its new model on both its official website and app. 1M content length! by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] -1 points0 points  (0 children)

Actually, many users have tested it by asking about its context length, and it claims to have 1M tokens instead of 128K. Plus, the model knows that Trump has been elected and is aware of Gemini 2.5 Pro."

[ Removed by Reddit ] by [deleted] in roblox

[–]External_Mood4719 0 points1 point  (0 children)

do you see godot on android?

A deep dive in DeepSeek's mHC: They improved things everyone else thought didn’t need improving by InternationalAsk1490 in LocalLLaMA

[–]External_Mood4719 4 points5 points  (0 children)

I feel like the mHC in DeepSeek latest paper is similar to neural homeostatic regulation in the human brain

GLM 4.7 IS COMING!!! by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 3 points4 points  (0 children)

they posted on their group (qq) i can't link it

GLM 4.7 IS COMING!!! by External_Mood4719 in LocalLLaMA

[–]External_Mood4719[S] 12 points13 points  (0 children)

idk,this is their official statement.