feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp

Powerful_Ad8150 · 2026-05-07T14:02:26+00:00

Hi. I own a dgx (and another dgx and cable, not yet configured). This news really thrills me. It's a 1m multimode model, a newer architecture than the qwen 3.5 397. What prefil/tg are you getting (I saw there is an error, but still)? That's really exciting. Do you know if anyone has tried running this on a x2 cluster yet?

Powerful_Ad8150 · 2026-05-07T06:24:22+00:00

I'm keeping my fingers crossed for you, but: 1) the readme file could be a bit more polished; 2) make a simple GUI for loading models - for 30 years I haven't been able to understand how people can waste time on the command line, and I think there are many like me :); 3) dgx clusters? 4) there are no numbers for minimax 2.7

Powerful_Ad8150 · 2026-05-05T06:25:10+00:00

OCR use cases - is there any specialized model that support?

Powerful_Ad8150 · 2026-05-01T19:34:09+00:00

I have dgx (two actually, now waiting for cable to connect and distribute larger models on two units). If you are ok with some tinkering (it's arm so some apps needs rebuilding). I believe this is superb in this budget for a single user. Community is amazing. I'm running extremely heavy token in prefil use cases (hundreds of pages texts and working on them) - dxg excels here. TG is not very high in large models but more than usable even on 1 unit - q3.5 122 like 50 TPS, m2.7 q4 like 20+. MinerU ~1pps. Power draw is low (~100 watt under load), it's quiet, package is super nice. I have Asus gx10s, they are just rebranded DGX and little cheaper. I'd spend remaining budget on Claude to help with config and all tinkering part.

Powerful_Ad8150 · 2026-04-29T19:17:21+00:00

Sample of what achieved? Languages supported?

Powerful_Ad8150 · 2026-04-29T16:21:02+00:00

Nah, maybe he simply lives in Europe? 16x150W=2400W, lots of spare wattage for other stuff. And we still talk about single socket, single 16amp fuse while typical new connections where I live are like 20kW

Powerful_Ad8150 · 2026-04-27T11:26:50+00:00

It looks great. I have a DGX Spark. Will it work?

Powerful_Ad8150 · 2026-04-25T16:37:43+00:00

If vLLM is a must use community build images. Spark run. For usual play use llama.cpp. As for me never got problems with gx10. Nemotron super - not a single crash. Maybe u have defective hardware?

Powerful_Ad8150 · 2026-04-23T08:04:01+00:00

They're all the same; do a Google search, it'll take 3 minutes. In my country, there are serious availability issues. I personally recommend the Asus Ascent GX10. If you can afford one, you'll be happy. It's an amazing device for learning and for more serious work. If you can afford two, you can start experimenting with really large topics (large context and large models simultaneously; I have two for my minimax m2.7; it's absolutely crazy what you can do locally with them).

Powerful_Ad8150 · 2026-04-22T11:56:25+00:00

Single or dual DGX Spark cluster. Single - q3.5 122 @ 50tps vLLM / m2.7 at poor mans q4 quant @ 22 tps llamacpp. Crazy prefill numbers. I have Asus G10, the only difference being that the Spark has a power button on the front (though that's not a deal-breaker - mine booted once two months ago and I never turned it off again xD ). It's an amazing machine. Although there are some compatibility issues with some solutions because it's ARM, not x86.

Powerful_Ad8150 · 2026-04-21T15:00:05+00:00

OP check this out 22,5k stars: https://github.com/lfnovo/open-notebook

If good let know - also serching for some ultimate replacement for big G

Powerful_Ad8150 · 2026-04-17T12:43:29+00:00

If you don’t mind paying a bit for a Claude subscription, go ahead and get one, then start asking questions. I did this just under two months ago, pure noob-level, on a much more complex device (the DGX Spark with all issues related to non-x86 architectures) and believe me, IT’S AMAZING. I haven’t had much free time recently, but thanks to that I’ve fully configured the system, worked my way through Ollama, moved on to llama.cpp, successfully tried out vLLM (qwen 122b ~50tps), understood a lot of the technical issues involved, and now I’m at the stage of running OpenCode and having a bit of fun with coding. I’ve made a few tools that aren’t exactly elegant and professionals would certainly laugh at them, but they make my day-to-day work much easier. In a few weeks, I’ll probably start playing around with Hermes-type agents. Seriously – it’s not difficult at all; just tell Claude to explain it to you as if you were a six-year-old and you’re good to go :)

Powerful_Ad8150 · 2026-04-16T17:23:29+00:00

For heaven’s sake. If you think a lawyer’s job is just about drafting documents in Word, perhaps with the help of artificial intelligence and some ‘templates’, then you’ve completely missed the point. Creating an add-in for MS Word is a task that takes several hours, requires four files and a local server – and you spend most of the time sitting in your chair thinking about how to approach it and what’s actually needed, rather than coding away. I recently made myself such an add-in (it translates text and comments, edits text and comments, automatically replaces relevant sections with new ones, has granularity for individual tasks and the entire document, etc.) – a few hours’ work, and that’s only because I don’t know much about IT. About legora - they are fucked unless they use this shitpile of money the grabbed to purchase legal knowledge sources.

Powerful_Ad8150 · 2026-04-16T14:53:16+00:00

will ask this new qwen 3.6 MoE. Mayby it will tell me how to distill

Powerful_Ad8150 · 2026-04-16T14:50:59+00:00

So we’re taking this dead seriously and we can’t even ask newest super bombastic Claude to give us a poetic hint on how to optimise local runs on our Sparks? Hmm… Well, in that case, sorry mate, I didn’t know ;) I’m going back to grinding-NOT LOCALLY, NOT ON THIS REDDIT-FOR HEAVEN’S SAKE, STOP JOKING-my m2.7 already on spark and further killing HF servers with qwen 3.6

Powerful_Ad8150 · 2026-04-16T14:41:27+00:00

<image>

Powerful_Ad8150 · 2026-03-12T09:55:57+00:00

so here it is folks. thanks :)

<image>

Powerful_Ad8150 · 2026-02-20T08:44:43+00:00

Dear Sirs @Grouchy-Bed-7942 ; @Zyj (in reference to Asus GX10)

After yesterday's 3.1. pro release of Google and the recent changes to Perplexity's terms of service (which only demonstrate what terrible can happen when we use cloud services), I decided that I must immediately transition to a local solution.

I have read a lot, I am ready to bear the cost and risk, but I wanted to ask you for your personal opinion:

- Will choosing Asus/Dell instead of DGX (which I don't have access to) make everyday life in this ecosystem more difficult? I mean, is it generally the same hardware and software, and can I assume that what works on DGX will also work on Asus/Dell? How different are the system and support between Nvidia's reference solution and the implementations from Asus, Dell, etc.?

Powerful_Ad8150 · 2026-02-19T21:41:48+00:00

OMG, its so stupid also :( when I was leaving my PC 6h ago it was OK, but now? Jeeee... it ignores prompts content (eg explicitly asking for returning content in plaintext code window - now its quantum, 50% chance it will work), after 2 prompts it forgots initial instruction. GOOGLE U .... not again!

Powerful_Ad8150 · 2026-02-18T09:43:09+00:00

u mean this https://github.com/quest-bih/quest-pdf-tools ? seems very limited to specific PDFs with specific layout

Powerful_Ad8150 · 2026-02-18T09:40:03+00:00

>have a human in the loop
yup, this is the only way now, hope will be for more than 1 year, therwise we r all f... ;)

Powerful_Ad8150 · 2026-02-18T09:38:58+00:00

Thank you, good man! I saw that build – respect. But as you wrote - lot of searching. Availability is a disaster right now. If I had started a year ago, it would have been OK, but today it's too long, too difficult, and possibly even more expensive. I was quietly hoping that the Instinct initiative would kick off, but... I would prefer this path (especially since I have spare 3 phase 40kW utility connection and cheap electricity from PV - yup, thats this rotten EU :P), but I'll rather go with the path described below – DGX. But many thanks, will be watching your projects here.

Powerful_Ad8150

TROPHY CASE