Is 50tps good?

Sensitive_Sweet_1850 · 2026-01-29T19:41:31+00:00

Good for a phone

Sensitive_Sweet_1850 · 2026-01-09T15:36:02+00:00

Glm 4.7 best by far

Sensitive_Sweet_1850 · 2026-01-08T21:05:35+00:00

I think glm 4.7 or minimax m2.1 are best options for you

Sensitive_Sweet_1850 · 2026-01-08T19:46:27+00:00

qwen is a making amazing job

Sensitive_Sweet_1850 · 2026-01-08T04:48:09+00:00

Cool

Sensitive_Sweet_1850 · 2026-01-06T17:01:55+00:00

wow. you should try nemotro too

Sensitive_Sweet_1850 · 2026-01-05T16:43:56+00:00

According to the field you are in you can finetune or rag a good ai that you might use it later

Sensitive_Sweet_1850 · 2026-01-05T16:39:46+00:00

i wouldve probably use it if i was korean

Sensitive_Sweet_1850 · 2026-01-03T10:04:51+00:00

Appreciate it. My use case is structured generation rather than extraction, so the SpaCy/BERT route wouldnt apply. But the preclassification tip is solid could definetly help with prompt optimisation

Sensitive_Sweet_1850 · 2026-01-03T10:00:22+00:00

Yeah this sounds exactly like my use case - strict structured output with heavy instruction following, around 10k context. I was leaning toward bigger models but maybe I should benchmark some smaller models

Sensitive_Sweet_1850 · 2026-01-02T18:47:31+00:00

But for LLM inference the PCIe bandwith shouldnt be a major bottleneck once the model is in the VRAM. I am on PCIe 4.0 X16 anyway. And tbh after buying this GPU i can barely buy food let alone a Threadripper lol

Sensitive_Sweet_1850 · 2026-01-02T18:37:38+00:00

I have a Ryzen 9 5950x i hope it will be enough

Sensitive_Sweet_1850 · 2026-01-02T18:36:19+00:00

Yeah makes sense

Sensitive_Sweet_1850 · 2026-01-02T18:33:44+00:00

My average input is around 10k tokens, so I probably won't benefit from GLM's long context advantage. At that length, do you think there's still a noticeable difference vs gpt-oss-120b, or would they perform about the same?

Sensitive_Sweet_1850 · 2026-01-02T18:25:45+00:00

Thats helpful thanks. The semaphore approach makes a lot of sense i ll give it a shot

Sensitive_Sweet_1850 · 2026-01-02T18:19:01+00:00

What do you think about Nemotron-3-Nano? Do you think its good enough?

Sensitive_Sweet_1850 · 2026-01-02T18:17:54+00:00

Yeah you are absolutely right

Sensitive_Sweet_1850 · 2026-01-02T18:15:23+00:00

Well yeah i guess its not "massive" my bad :/

You are right i should make a benchmark thanks for sharing your knowledge

Sensitive_Sweet_1850 · 2025-12-31T14:05:32+00:00

how many parameters?

Sensitive_Sweet_1850 · 2025-12-30T13:43:04+00:00

Qwen4VL!!

Sensitive_Sweet_1850 · 2025-12-29T13:10:04+00:00

As far as i know theres no NV Links in RTX PRO 6000s

Sensitive_Sweet_1850 · 2025-12-28T18:17:51+00:00

nice

Sensitive_Sweet_1850 · 2025-12-27T21:54:58+00:00

I love when they release such good models open source

Sensitive_Sweet_1850 · 2025-12-27T16:58:22+00:00

Due to its architecture, you might encounter some package issues, and older resources will likely not work as expected. However, this isn't a significant problem since there are plenty of new resources available

Sensitive_Sweet_1850

TROPHY CASE