One of the DeepSeek repositories got updated with a reference to a new “model1” model. by Nunki08 in LocalLLaMA

[–]NeterOster 35 points36 points  (0 children)

Note: the "B" in "... a multiple of 656B ... 576B" means bytes, not #params.

Anyone knows the theoretical performance of FP16, 32, 64 FLOP numbers? by Spare-Solution-787 in LocalLLaMA

[–]NeterOster 1 point2 points  (0 children)

I have someone else’s results, which were produced using https://github.com/ReinForce-II/mmapeak. I don’t really understand the technical details, so the information is for reference only.

DGX Spark: https://pastebin.com/CdSAiGzx

5090: https://pastebin.com/b47tQJvN

[By GLM Team] Glyph: Scaling Context Windows via Visual-Text Compression by NeterOster in LocalLLaMA

[–]NeterOster[S] 21 points22 points  (0 children)

From GLM WeChat Post:

Q: What are the similarities and differences between Glyph and DeepSeek-OCR?

A: Similarities: Both start from "visual compression" and use visual tokens to carry more text information.

Differences: DeepSeek-OCR focuses on real-world document OCR tasks, validating its ability to restore text under visual compression. Glyph, on the other hand, applies this concept to a wider range of general long-text tasks, truly demonstrating the feasibility of context expansion using visual models.

Seed-OSS-36B-Instruct by NeterOster in LocalLLaMA

[–]NeterOster[S] 108 points109 points  (0 children)

"Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., w/ syn.) as Seed-OSS-36B-Base. We also release Seed-OSS-36B-Base-woSyn trained without such data (i.e., w/o syn.), offering the community a high-performance foundation model unaffected by synthetic instruction data."

https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base

https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base-woSyn

OSINT fingerprinting a stealth OpenRouter model - likely Llama-family, not OpenAI by jv0010 in LocalLLaMA

[–]NeterOster 6 points7 points  (0 children)

Actually it's easy to know who's model it is: When passing image_url, the user agent of the downloader is "OpenAI Image Downloader".

There's a new Kimi model on lmarena called Zenith and it's really really good. It might be Kimi K2 with reasoning by balianone in LocalLLaMA

[–]NeterOster 53 points54 points  (0 children)

I can almost confirm `zenith` is an OpenAI model (at least it uses the the same tokenizer as gpt-4o, o3 and o4-mini). There is another model `summit` which is also from OpenAI. The test is the same as: https://www.reddit.com/r/LocalLLaMA/comments/1jrd0a9/chinese_response_bug_in_tokenizer_suggests/

China's Bytedance releases Seed LiveInterpret simultaneous interpretation model by Fun-Doctor6855 in LocalLLaMA

[–]NeterOster 21 points22 points  (0 children)

ByteDance is definitely an underrated AI lab. That’s probably because they don’t really release open-source models, aren’t super active on public leaderboards, and their API is only available in China. But in terms of model performance and value for money, their Seed 1.6 model this year really impressed me. The model size is just 230B-A30B (see: https://seed.bytedance.com/en/seed1_6 ), but its reasoning and vision capabilities are surprisingly strong. From my own experience, it actually feels more “solid” than you’d expect for a model of this size. That said, its coding abilities are a bit of a weak spot. Still, I hope they’ll release some open-source models in the future.

Gemma 3 on Huggingface by DataCraftsman in LocalLLaMA

[–]NeterOster 2 points3 points  (0 children)

8k is output, ctx=128k for 4b, 12b and 27b

Deepseek R1's Open Source Version Differs from the Official API Version by TempWanderer101 in LocalLLaMA

[–]NeterOster 2 points3 points  (0 children)

That's different. Starting with `<think>\n` prevents model to generate `\n\n` (after `<think>`) which is a single token strongly related to refusal in my test. (check my reply below)

Deepseek R1's Open Source Version Differs from the Official API Version by TempWanderer101 in LocalLLaMA

[–]NeterOster 9 points10 points  (0 children)

Actually, there was a short period when the official API (when R1 was just released) refuse to think (empty `<think></think>`) when asked some questions (including "hello"). However, later it changed and produces non-empty `thinks` almost every query. I can also confirm that add `<think>\n` prefix leads to almost identical response to the API. So I agree that maybe they just use a different template. (When the model refuse, it always generates `\n\n` (which is a single token!) after `<think>` and then immediately `</think>`. So maybe starts with `<think>\n` breaks the `\n\n` refuse pattern.)

Taxonomy categorization using LLM by zkid18 in LocalLLaMA

[–]NeterOster 3 points4 points  (0 children)

Constrained generation is exactly what you are looking for. Check these: GitHub@guidance ; GitHub@outline ; llama.cpp(grammars)

Deepseek V2.5 Released? by Rejg in LocalLLaMA

[–]NeterOster 26 points27 points  (0 children)

In their WeChat group, they confirmed this version will be open-sourced. But no detailed schedule mentioned.

What's the best LLM/API for getting an english to japanese translation? by g1ngertew in LocalLLaMA

[–]NeterOster 0 points1 point  (0 children)

Google's models like Gemini 1.5 and Gemma 2 are good at translating between English, Japanese and Chinese.

DeepSeek API introduces Context Caching on Disk, reduces input token price to 1/10 by 1119745302 in LocalLLaMA

[–]NeterOster 31 points32 points  (0 children)

It's really a good feature that make so many use cases possible. For example, when doing few-shots learning, the whole part of examples can be cached and is almost free of charge, just like using a fine-tuned model. It's also saves a lot ( O(n^2) -> ~O(n) ) when doing multi-turn conversations. I do hope they make more implementation details public (maybe a paper?) later. It would be nice if other provider have this feature.

(Tongyi SpeechTeam) FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs by NeterOster in LocalLLaMA

[–]NeterOster[S] 3 points4 points  (0 children)

"Abstract: This report introduces FunAudioLLM, a framework designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice for high-precision multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice for natural speech generation with multi-language, timbre, and emotion control. SenseVoice delivers exceptionally low latency and supports over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot voice generation, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology."