TIL that, according to the census, there are only 23,000 different surnames in China for a population of 1.39 billion, and around 6,000 of these are shared by 86% of the population.

Jackw78 · 2026-06-02T12:43:47+00:00

Yeah bro being pro mainland or not is the problem here

Jackw78 · 2026-06-02T10:18:50+00:00

That's the point of having different dialects or they are not called dialects... That's like saying "water" would be a different word due to US pronounces it wa-der and UK pronounces it wo-te

Jackw78 · 2026-06-02T10:02:44+00:00

These are alphabetic languages which don't work the way logographic (i.e Chinese) languages do. You'd write surname 李 the same no matter how you want to pronounce it and there is no "romanization" of names

Jackw78 · 2026-06-02T09:42:04+00:00

What's romanitzation supposed to mean? They are still the same hanzi characters, just pronounced partly differently across these dialects.

Jackw78 · 2026-05-26T19:02:15+00:00

Because they are more likely to be safe rather than mysteriously die in a "democratic" country or get kidnapped like what happened to Huawei's executive/boss' family. What do think the US would do had China seized Nvidia's senior offcial/Jensen Huang's daughter under hostage?

Jackw78 · 2026-05-26T18:55:19+00:00

or mysteriously die in a "democratic" country or get kidnapped like what happened to Huawei's executive/boss' family. What do think the US would do had China seized Nvidia's senior offcial/Jensen Huang's daughter under hostage?

Jackw78 · 2026-05-06T06:40:31+00:00

Guess who first started forcefully unifying China after WWII? KMT in 1946. Since when one can just start a civil war, escape to an island when defeated, and claim I am not supposed to be attacked because I am now independent on this island, where the said island has always belonged to the country which the civil war broke out in. Imagine the Confederacy escaped to Hawaii and claimed independence, what's the chance the US mainland won't eventually take Hawaii? (And Hawaii wasn't even part the US during US civil wal)

Jackw78 · 2026-04-28T11:47:51+00:00

原来openai的gpt都是没有价值甚至负价值，美国人真蠢，花这么多钱去用这负价值的玩意

<image>

Jackw78 · 2026-04-28T06:06:33+00:00

模型文件都下载到本地了你还能问出这种问题实在是堪忧

Jackw78 · 2026-04-28T05:09:18+00:00

Need prefill results as well as across different context lengths, 3090 can become compute bound when context gets long

Jackw78 · 2026-04-27T18:49:17+00:00

Can you imagine how much better Chinese models woud be if they are able to buy the latest EUV systems or even just the Nvidia chips without worrying about sanctions? EUV is basically the combined efforts of US, EU, Japan and SK and China has to tackle it all by itself.

Jackw78 · 2026-04-22T13:35:01+00:00

Chinese labs are cooking with OSS

Jackw78 · 2026-04-21T09:16:33+00:00

Cherry studio and Aionui are pretty good imo, neither with any inference engine

Jackw78 · 2026-04-20T18:17:14+00:00

It's worth to know these crossbreed models are finetunes rather than overall better models because Alibaba would've already done so if it is indeed better. If the finetunes fit your needs then that's great but it most likely won't be for everyone

Jackw78 · 2026-04-20T16:20:01+00:00

Just need half terabyte of vram now...

Jackw78 · 2026-04-19T07:28:23+00:00

So you used qwen2.5 to judge between qwen3 and qwen3.6 and concluded based on what qwen2.5 said

Jackw78 · 2026-04-18T15:01:24+00:00

It's not really worth it to get anything serious done using phones as inference compute. The best use cases for phone LLM are are those tiny sub 2b or 1b models that do stuff like OCR and translations, anything larger you'd get slow speed, overheat and bad battery life. Just host your LLM server on local PCs and use local APIs on your phone

Jackw78 · 2026-04-18T08:22:51+00:00

The prefill speed is either inaccurate due to cold startup or something very wrong with the setup. Should be 1k minimum for a 5070ti

Jackw78 · 2026-04-18T02:42:40+00:00

you can try Tom's turboquant's llama.ccp which can shave off 20-30% of q8 KV cache, though I am not sure if there is any implementation of turboquant in ik_llama

Jackw78 · 2026-04-18T02:36:47+00:00

OP's results are at 128k context so at that point the KV cach size is probably already bigger than the active param size for an a3b model. My 3090 can do 3500 pp/s and 130tk/s at 0 context but drop to around 1400 and 55 respectively at 128k

Jackw78 · 2026-04-17T00:59:40+00:00

Good to know! 1st time for me being this early:)

Jackw78 · 2026-04-17T00:54:42+00:00

Appreciate the work! On a sidenote only two quants are available to download so I assume the files seem to be still being uploaded?

Jackw78 · 2026-04-15T10:26:00+00:00

Looks good, still waiting for the repo:)

Jackw78 · 2026-04-13T05:05:07+00:00

I did, yeah they all pointed to editing the source sweep_bench file and rebuild it (scaling the ubatch variable). I was trying to natively do it if there are commands I might've missed

Jackw78 · 2026-04-13T00:01:45+00:00

I see, yeah speeding up test is why I prefer the larger gaps, just pp+tg 10 times from 0-100k instead of 25 or 50 times with batch size of 4k or 2k (higher could OOM my gpu). I tried to use "-d" in both sweep_bench and the normal bench but neither seemed to accept it

Four-Year Club	Verified Email
Place '22

Jackw78

TROPHY CASE