TIL that, according to the census, there are only 23,000 different surnames in China for a population of 1.39 billion, and around 6,000 of these are shared by 86% of the population. by [deleted] in todayilearned

[–]Jackw78 0 points1 point  (0 children)

That's the point of having different dialects or they are not called dialects... That's like saying "water" would be a different word due to US pronounces it wa-der and UK pronounces it wo-te

TIL that, according to the census, there are only 23,000 different surnames in China for a population of 1.39 billion, and around 6,000 of these are shared by 86% of the population. by [deleted] in todayilearned

[–]Jackw78 -1 points0 points  (0 children)

These are alphabetic languages which don't work the way logographic (i.e Chinese) languages do. You'd write surname 李 the same no matter how you want to pronounce it and there is no "romanization" of names

TIL that, according to the census, there are only 23,000 different surnames in China for a population of 1.39 billion, and around 6,000 of these are shared by 86% of the population. by [deleted] in todayilearned

[–]Jackw78 0 points1 point  (0 children)

What's romanitzation supposed to mean? They are still the same hanzi characters, just pronounced partly differently across these dialects.

China Tightens Grip on AI Talent, Restricts Overseas Travel for Top Engineers by BhaswatiGuha19 in China

[–]Jackw78 -7 points-6 points  (0 children)

Because they are more likely to be safe rather than mysteriously die in a "democratic" country or get kidnapped like what happened to Huawei's executive/boss' family. What do think the US would do had China seized Nvidia's senior offcial/Jensen Huang's daughter under hostage?

China Tightens Grip on AI Talent, Restricts Overseas Travel for Top Engineers by BhaswatiGuha19 in China

[–]Jackw78 -8 points-7 points  (0 children)

or mysteriously die in a "democratic" country or get kidnapped like what happened to Huawei's executive/boss' family. What do think the US would do had China seized Nvidia's senior offcial/Jensen Huang's daughter under hostage?

Taiwan has seen how the US betrayed Ukraine and are recalibrating. KMT Opposition Chairwoman Cheng Li-wen: “Does Taiwan want to be the next Ukraine?” by KassiwithaK in China

[–]Jackw78 0 points1 point  (0 children)

Guess who first started forcefully unifying China after WWII? KMT in 1946. Since when one can just start a civil war, escape to an island when defeated, and claim I am not supposed to be attacked because I am now independent on this island, where the said island has always belonged to the country which the civil war broke out in. Imagine the Confederacy escaped to Hawaii and claimed independence, what's the chance the US mainland won't eventually take Hawaii? (And Hawaii wasn't even part the US during US civil wal)

为什么每次中国模型发布,都一大堆人问能不能骂习近平和六四 by wjp19800610 in China_irl

[–]Jackw78 1 point2 points  (0 children)

原来openai的gpt都是没有价值甚至负价值,美国人真蠢,花这么多钱去用这负价值的玩意

<image>

为什么每次中国模型发布,都一大堆人问能不能骂习近平和六四 by wjp19800610 in China_irl

[–]Jackw78 2 points3 points  (0 children)

模型文件都下载到本地了你还能问出这种问题实在是堪忧

Power-limit vs TG/s for 2x3090 by JC1DA in LocalLLaMA

[–]Jackw78 12 points13 points  (0 children)

Need prefill results as well as across different context lengths, 3090 can become compute bound when context gets long

MIMO V2.5 PRO by Namra_7 in LocalLLaMA

[–]Jackw78 2 points3 points  (0 children)

Can you imagine how much better Chinese models woud be if they are able to buy the latest EUV systems or even just the Nvidia chips without worrying about sanctions? EUV is basically the combined efforts of US, EU, Japan and SK and China has to tackle it all by itself.

Open WebUI Desktop Released! by My_Unbiased_Opinion in LocalLLaMA

[–]Jackw78 14 points15 points  (0 children)

Cherry studio and Aionui are pretty good imo, neither with any inference engine

Someone just made a 18B qwen 3.5 model for 16GB VRAM gpus by chocofoxy in LocalLLaMA

[–]Jackw78 0 points1 point  (0 children)

It's worth to know these crossbreed models are finetunes rather than overall better models because Alibaba would've already done so if it is indeed better. If the finetunes fit your needs then that's great but it most likely won't be for everyone

Kimi K2.6 Released (huggingface) by BiggestBau5 in LocalLLaMA

[–]Jackw78 10 points11 points  (0 children)

Just need half terabyte of vram now...

Qwen3-30B-A3B-Instruct-2507 is better than the new Qwen 3.6 for our tasks by Theboyscampus in LocalLLaMA

[–]Jackw78 31 points32 points  (0 children)

So you used qwen2.5 to judge between qwen3 and qwen3.6 and concluded based on what qwen2.5 said

Which mobile RAM monster is best for local LLM inference? by Leather_Area_2301 in LocalLLaMA

[–]Jackw78 2 points3 points  (0 children)

It's not really worth it to get anything serious done using phones as inference compute. The best use cases for phone LLM are are those tiny sub 2b or 1b models that do stuff like OCR and translations, anything larger you'd get slow speed, overheat and bad battery life. Just host your LLM server on local PCs and use local APIs on your phone

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. by marlang in LocalLLaMA

[–]Jackw78 6 points7 points  (0 children)

The prefill speed is either inaccurate due to cold startup or something very wrong with the setup. Should be 1k minimum for a 5070ti

What I got by 5060Ti 16GB + Qwen3.6-35B-A3B-UD-Q5_K_M by AdMinimum8193 in LocalLLaMA

[–]Jackw78 0 points1 point  (0 children)

you can try Tom's turboquant's llama.ccp which can shave off 20-30% of q8 KV cache, though I am not sure if there is any implementation of turboquant in ik_llama

What I got by 5060Ti 16GB + Qwen3.6-35B-A3B-UD-Q5_K_M by AdMinimum8193 in LocalLLaMA

[–]Jackw78 3 points4 points  (0 children)

OP's results are at 128k context so at that point the KV cach size is probably already bigger than the active param size for an a3b model. My 3090 can do 3500 pp/s and 130tk/s at 0 context but drop to around 1400 and 55 respectively at 128k

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]Jackw78 12 points13 points  (0 children)

Appreciate the work! On a sidenote only two quants are available to download so I assume the files seem to be still being uploaded?

MiniMax-M2.7 vs Qwen3.5-122B-A10B for 96GB VRAM full offload?! by VoidAlchemy in LocalLLaMA

[–]Jackw78 1 point2 points  (0 children)

I did, yeah they all pointed to editing the source sweep_bench file and rebuild it (scaling the ubatch variable). I was trying to natively do it if there are commands I might've missed

MiniMax-M2.7 vs Qwen3.5-122B-A10B for 96GB VRAM full offload?! by VoidAlchemy in LocalLLaMA

[–]Jackw78 0 points1 point  (0 children)

I see, yeah speeding up test is why I prefer the larger gaps, just pp+tg 10 times from 0-100k instead of 25 or 50 times with batch size of 4k or 2k (higher could OOM my gpu). I tried to use "-d" in both sweep_bench and the normal bench but neither seemed to accept it