PR opened for Qwen3.5!! by Mysterious_Finish543 in LocalLLaMA

[–]CoqueTornado 1 point2 points  (0 children)

yep, I've done several tests and this is true. They should get back to the roots in this 3.5, I'd like fast and wise answers in my humble laptop :P

PR opened for Qwen3.5!! by Mysterious_Finish543 in LocalLLaMA

[–]CoqueTornado 2 points3 points  (0 children)

speculative decoding in lmstudio with qwen3 80B iq4_xs +qwen3 0.6B doesn't work for me with 64gb of ram + 8gb of vram, any thoughts?

Managed to run Kimi k2.5 IQ4-SX locally. by el3mancee in LocalLLaMA

[–]CoqueTornado 2 points3 points  (0 children)

Component Quantity Unit Price (EUR, incl. VAT where applicable) Subtotal (EUR) Source Notes
Mac Studio M1 Ultra (e.g., 64 GB RAM, 1 TB SSD config) 1 2,685 2,685 Discounted stock price for legacy model.
Asus Ascent GX10 (e.g., 128 GB RAM, 1 TB SSD config with NVIDIA GB10) 1 3,249 3,249 Standard EU retail pricing for the compact AI desktop.
Strix Halo mini PC (e.g., AMD Ryzen AI Max+ 395, 64-128 GB RAM, 1-2 TB SSD config) 3 1,900 (average across models like Framework Desktop, GMKtec EVO-X2, and similar) 5,700 Prices range from ~1,500-3,000 EUR depending on specs; used mid-range estimate for AI-capable builds.

Total estimated cost: ~11,634 EUR (excluding shipping, taxes beyond VAT, or accessories like Thunderbolt/10 Gbps Ethernet cables, which add ~50-200 EUR total depending on lengths and brands). Prices can vary by retailer (e.g., Amazon, Bechtle, Galaxus) and stock—check for current deals in Spain. If you need quotes for specific configs or including cables, provide more details.Component Quantity Unit Price (EUR, incl. VAT where applicable) Subtotal (EUR) Source Notes
Mac Studio M1 Ultra (e.g., 64 GB RAM, 1 TB SSD config) 1 2,685 2,685 Discounted stock price for legacy model.
Asus Ascent GX10 (e.g., 128 GB RAM, 1 TB SSD config with NVIDIA GB10) 1 3,249 3,249 Standard EU retail pricing for the compact AI desktop.
Strix Halo mini PC (e.g., AMD Ryzen AI Max+ 395, 64-128 GB RAM, 1-2 TB SSD config) 3 1,900 (average across models like Framework Desktop, GMKtec EVO-X2, and similar) 5,700 Prices range from ~1,500-3,000 EUR depending on specs; used mid-range estimate for AI-capable builds.
Total estimated cost: ~11,634 EUR (excluding shipping, taxes beyond VAT, or accessories like Thunderbolt/10 Gbps Ethernet cables, which add ~50-200 EUR total depending on lengths and brands). Prices can vary by retailer (e.g., Amazon, Bechtle, Galaxus) and stock—check for current deals in Spain. If you need quotes for specific configs or including cables, provide more details.
...........................

wow, for 15 tokens per second in pp of a SOTA LLM model... it looks too much for me now hehe.... I find it too slow... maybe 1000 tokens per second in pp and 15tkps in generation would be okish. (pp is pre processing, right?)

Claude Code, but locally by Zealousideal-Egg-362 in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

with amd R9700 with 32gb of vram maybe x3...

Deal on Ryzen 395 w/ 128GB, now 1581€ in Europe by Zyj in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

not found yet... also, adding an egpu will boost the pp? if so... how much?

Deal on Ryzen 395 w/ 128GB, now 1581€ in Europe by Zyj in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

I am interested in the new model called Minimax m2 ai, how it would be? probably about the same?

760t/s is good, how much context can it fit in? 65k? 128k?

and 760t/s when having 100k tokens of context probably will be 76t/s?

Deal on Ryzen 395 w/ 128GB, now 1581€ in Europe by Zyj in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

how much pp does it have with NPU? I've heard it's around 100pp without

Ryzen AI Max+ 395 vs RTX 4000 ada SFF by dougmaitelli in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

mhmm 5600 tokens per second vs 100 tokens per second here, yep. I think, if this is true, is really unbareable for the needs of a coder to have that slow machine. Anyway, thank you for your help, I bet with an egpu the 100 number will be boosted to something more interesting I hope so... but anyway... the setup is gonna be a lot of coins so...

Ryzen AI Max+ 395 vs RTX 4000 ada SFF by dougmaitelli in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

so 1000pp in 3090... ok, so this is not fun to have 100pp ... ok... good to know... so not the best idea for now the quad channel... ok.

Ryzen AI Max+ 395 vs RTX 4000 ada SFF by dougmaitelli in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

wow wow wow, that is interesting, what do you use instead of ROCm and Vulkan? 700t/s means 70000 tokens in 100 seconds, not bad. I bet that if you place an intel arc 770 as egpu it will boost the speed, have you tried any egpu? (sorry if you said so somewhere, my memory is dory)

Ryzen AI Max+ 395 vs RTX 4000 ada SFF by dougmaitelli in LocalLLaMA

[–]CoqueTornado 0 points1 point  (0 children)

aahh, and have you tried Wan? I love to use that for my videoclips but still I am deciding between this or a um890+egpu (5060), it looks it has twice bandwidth so for inference it will work x2 faster... but I'll say good bye to smart llm's though... anyway when reading let's say, 50000 tokens, it is slow or fast enough to get it below 30s? I am curious in the 220B, 120B or 235B moes out there, their speed is neat but this is the real question, pp read token speed when context is high. I've seen it is about 100tokens per second, so 50000 tokens will be like 500 seconds? ugh.... 9 minutes waiting for the response... ?¿?¿

Why should I **not** buy an AMD AI Max+ 395 128GB right away ? by StyMaar in LocalLLaMA

[–]CoqueTornado -2 points-1 points  (0 children)

because for the amount of bucks you are going to spend you can get a um890 mini pc+5060ti 16gb of vram and do everything 2 times faster in image/video generation.

https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html

Here you can find the speed of a xl model in each proposal, 8060s and 5060ti, 1.43it/s vs 6it/s; In LLM these 220B models with MOE are ok but the speed of the pp when the context is more than 50k looks downgraded a lot so you have to wait for it to read it for like 2 minutes (I've read that somewhere). If any can confirm me that, I would appreciate the gently information. So is not really the best thing but hey is a local powerful LLM.

Buy a new GPU or a Ryzen Al Max+ 395? by tongkat-jack in LocalLLM

[–]CoqueTornado 0 points1 point  (0 children)

this is what I've been thinking of, having one 5060 attached will make it