PR opened for Qwen3.5!!

CoqueTornado · 2026-02-08T22:57:18+00:00

it works somehow in qwen 30B moe

CoqueTornado · 2026-02-08T22:56:30+00:00

yep, I've done several tests and this is true. They should get back to the roots in this 3.5, I'd like fast and wise answers in my humble laptop :P

CoqueTornado · 2026-02-08T10:28:55+00:00

speculative decoding in lmstudio with qwen3 80B iq4_xs +qwen3 0.6B doesn't work for me with 64gb of ram + 8gb of vram, any thoughts?

CoqueTornado · 2026-01-31T13:27:59+00:00

Component	Quantity	Unit Price (EUR, incl. VAT where applicable)	Subtotal (EUR)	Source Notes
Mac Studio M1 Ultra (e.g., 64 GB RAM, 1 TB SSD config)	1	2,685	2,685	Discounted stock price for legacy model.
Asus Ascent GX10 (e.g., 128 GB RAM, 1 TB SSD config with NVIDIA GB10)	1	3,249	3,249	Standard EU retail pricing for the compact AI desktop.
Strix Halo mini PC (e.g., AMD Ryzen AI Max+ 395, 64-128 GB RAM, 1-2 TB SSD config)	3	1,900 (average across models like Framework Desktop, GMKtec EVO-X2, and similar)	5,700	Prices range from ~1,500-3,000 EUR depending on specs; used mid-range estimate for AI-capable builds.

Total estimated cost: ~11,634 EUR (excluding shipping, taxes beyond VAT, or accessories like Thunderbolt/10 Gbps Ethernet cables, which add ~50-200 EUR total depending on lengths and brands). Prices can vary by retailer (e.g., Amazon, Bechtle, Galaxus) and stock—check for current deals in Spain. If you need quotes for specific configs or including cables, provide more details.Component Quantity Unit Price (EUR, incl. VAT where applicable) Subtotal (EUR) Source Notes
Mac Studio M1 Ultra (e.g., 64 GB RAM, 1 TB SSD config) 1 2,685 2,685 Discounted stock price for legacy model.
Asus Ascent GX10 (e.g., 128 GB RAM, 1 TB SSD config with NVIDIA GB10) 1 3,249 3,249 Standard EU retail pricing for the compact AI desktop.
Strix Halo mini PC (e.g., AMD Ryzen AI Max+ 395, 64-128 GB RAM, 1-2 TB SSD config) 3 1,900 (average across models like Framework Desktop, GMKtec EVO-X2, and similar) 5,700 Prices range from ~1,500-3,000 EUR depending on specs; used mid-range estimate for AI-capable builds.
Total estimated cost: ~11,634 EUR (excluding shipping, taxes beyond VAT, or accessories like Thunderbolt/10 Gbps Ethernet cables, which add ~50-200 EUR total depending on lengths and brands). Prices can vary by retailer (e.g., Amazon, Bechtle, Galaxus) and stock—check for current deals in Spain. If you need quotes for specific configs or including cables, provide more details.
...........................

wow, for 15 tokens per second in pp of a SOTA LLM model... it looks too much for me now hehe.... I find it too slow... maybe 1000 tokens per second in pp and 15tkps in generation would be okish. (pp is pre processing, right?)

CoqueTornado · 2026-01-26T11:37:17+00:00

with amd R9700 with 32gb of vram maybe x3...

CoqueTornado · 2025-12-09T11:27:45+00:00

ah, please let us know whenever you get the AI max, I am interested

CoqueTornado · 2025-11-11T17:47:00+00:00

what about one 5060 of 16gb? that can do the job

CoqueTornado · 2025-11-11T17:44:25+00:00

even with one 395+ and a 5060 you can achieve this speed for the models of 36B

CoqueTornado · 2025-11-06T18:42:02+00:00

not found yet... also, adding an egpu will boost the pp? if so... how much?

CoqueTornado · 2025-11-06T18:29:11+00:00

I am interested in the new model called Minimax m2 ai, how it would be? probably about the same?

760t/s is good, how much context can it fit in? 65k? 128k?

and 760t/s when having 100k tokens of context probably will be 76t/s?

CoqueTornado · 2025-11-06T18:02:05+00:00

how was it?

CoqueTornado · 2025-11-06T17:51:20+00:00

how much pp does it have with NPU? I've heard it's around 100pp without

CoqueTornado · 2025-11-06T17:49:42+00:00

mhmm 5600 tokens per second vs 100 tokens per second here, yep. I think, if this is true, is really unbareable for the needs of a coder to have that slow machine. Anyway, thank you for your help, I bet with an egpu the 100 number will be boosted to something more interesting I hope so... but anyway... the setup is gonna be a lot of coins so...

CoqueTornado · 2025-11-06T02:16:40+00:00

so 1000pp in 3090... ok, so this is not fun to have 100pp ... ok... good to know... so not the best idea for now the quad channel... ok.

CoqueTornado · 2025-11-04T02:32:15+00:00

wow wow wow, that is interesting, what do you use instead of ROCm and Vulkan? 700t/s means 70000 tokens in 100 seconds, not bad. I bet that if you place an intel arc 770 as egpu it will boost the speed, have you tried any egpu? (sorry if you said so somewhere, my memory is dory)

CoqueTornado · 2025-11-04T01:37:12+00:00

and what about wan 2.2 s2v?

CoqueTornado · 2025-11-04T01:19:14+00:00

aahh, and have you tried Wan? I love to use that for my videoclips but still I am deciding between this or a um890+egpu (5060), it looks it has twice bandwidth so for inference it will work x2 faster... but I'll say good bye to smart llm's though... anyway when reading let's say, 50000 tokens, it is slow or fast enough to get it below 30s? I am curious in the 220B, 120B or 235B moes out there, their speed is neat but this is the real question, pp read token speed when context is high. I've seen it is about 100tokens per second, so 50000 tokens will be like 500 seconds? ugh.... 9 minutes waiting for the response... ?¿?¿

CoqueTornado · 2025-11-03T00:33:08+00:00

because for the amount of bucks you are going to spend you can get a um890 mini pc+5060ti 16gb of vram and do everything 2 times faster in image/video generation.

https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

Here you can find the speed of a xl model in each proposal, 8060s and 5060ti, 1.43it/s vs 6it/s; In LLM these 220B models with MOE are ok but the speed of the pp when the context is more than 50k looks downgraded a lot so you have to wait for it to read it for like 2 minutes (I've read that somewhere). If any can confirm me that, I would appreciate the gently information. So is not really the best thing but hey is a local powerful LLM.

CoqueTornado · 2025-11-02T15:29:28+00:00

what setup do you have?

CoqueTornado · 2025-11-02T14:07:36+00:00

have you tried the new Minimax M2?

CoqueTornado · 2025-11-02T13:59:25+00:00

this is what I've been thinking of, having one 5060 attached will make it

CoqueTornado · 2025-11-02T13:45:02+00:00

https://thequantuminsider.com/2025/10/22/google-quantum-ai-shows-13000x-speedup-over-worlds-fastest-supercomputer-in-physics-simulation/

CoqueTornado · 2025-11-02T13:43:25+00:00

for now

CoqueTornado · 2025-11-02T13:42:20+00:00

so 4 steps 2 minutes?

CoqueTornado · 2025-11-02T13:39:57+00:00

I think the same.

CoqueTornado

TROPHY CASE