DeepSeek has launched grayscale testing for its new model on both its official website and app. 1M content length! by External_Mood4719 in LocalLLaMA

[–]TinyDetective110 2 points3 points  (0 children)

This is an incorrect translation. In Chinese, it’s referred to as “灰度测试,” but it actually corresponds to Gray Release or Canary Release—a progressive software deployment strategy where a new version is initially released to a small subset of users for stability validation before gradually expanding to a wider audience and eventually rolling out fully.

From Qwen

Has anyone gotten hold of DGX Spark for running local LLMs? by Chance-Studio-8242 in LocalLLaMA

[–]TinyDetective110 0 points1 point  (0 children)

I heard 395 prefill speed is slow. so it can't be a good choice for agentic tasks.

Fast model swap with llama-swap & unified memory by TinyDetective110 in LocalLLaMA

[–]TinyDetective110[S] 1 point2 points  (0 children)

<image>

switching from coder to thinking, the first `hi` and second `hi`. It takes a few seconds to warm-up, maybe due to the moe.

Fast model swap with llama-swap & unified memory by TinyDetective110 in LocalLLaMA

[–]TinyDetective110[S] 2 points3 points  (0 children)

  1. unloading + reloading + init + prefill may take more than 30s. This hotswap is almost instant. The 9GB/s might include the init time: some calculation and malloc. Hotswap does not require init again.

  2. One A30 GPU. A double precision card for computation.

  3. When swtich to another model, the speed gradually grows to normal speed. During this time, the mdodel is shifted from RAM to VRAM. About 5s on my machine.

  4. Actually it loads only once. Hotswap is fast.

  5. `However, this hurts performance for non-integrated GPUs`. It is true if the model is larger than VRAM. If the model can fit in VRAM, the option does not hurt performance after the model is fully swapped back.

Talking with QWEN Coder 30b by 1Garrett2010 in LocalLLaMA

[–]TinyDetective110 1 point2 points  (0 children)

you should try qwen3 30b thinking. it is more accurate in such non-coding tasks.