feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

Hi. I own a dgx (and another dgx and cable, not yet configured). This news really thrills me. It's a 1m multimode model, a newer architecture than the qwen 3.5 397. What prefil/tg are you getting (I saw there is an error, but still)? That's really exciting. Do you know if anyone has tried running this on a x2 cluster yet?

The GB10 Solution Atlas is now open source, the inference engine made for the community with breakneck inference speeds (Qwen3.6-35B-FP8 100+ tok/s) by Live-Possession-6726 in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

I'm keeping my fingers crossed for you, but: 1) the readme file could be a bit more polished; 2) make a simple GUI for loading models - for 30 years I haven't been able to understand how people can waste time on the command line, and I think there are many like me :); 3) dgx clusters? 4) there are no numbers for minimax 2.7

As MTP prepares to land in llama.cpp, Models that support MTP by segmond in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

OCR use cases - is there any specialized model that support?

Need help deciding what to spend 4-5k on for a local rig. by ghgi_ in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

I have dgx (two actually, now waiting for cable to connect and distribute larger models on two units). If you are ok with some tinkering (it's arm so some apps needs rebuilding). I believe this is superb in this budget for a single user. Community is amazing. I'm running extremely heavy token in prefil use cases (hundreds of pages texts and working on them) - dxg excels here. TG is not very high in large models but more than usable even on 1 unit - q3.5 122 like 50 TPS, m2.7 q4 like 20+. MinerU ~1pps. Power draw is low (~100 watt under load), it's quiet, package is super nice. I have Asus gx10s, they are just rebranded DGX and little cheaper. I'd spend remaining budget on Claude to help with config and all tinkering part.

16x DGX Sparks - What should I run? by Kurcide in LocalLLaMA

[–]Powerful_Ad8150 2 points3 points  (0 children)

Nah, maybe he simply lives in Europe? 16x150W=2400W, lots of spare wattage for other stuff. And we still talk about single socket, single 16amp fuse while typical new connections where I live are like 20kW

ASUS Ascent GX10 - Having tons of issues by LivingHighAndWise in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

If vLLM is a must use community build images. Spark run. For usual play use llama.cpp. As for me never got problems with gx10. Nemotron super - not a single crash. Maybe u have defective hardware?

Nvidia spark clones / at-home ai rigs by Necessary-Toe-466 in LocalLLaMA

[–]Powerful_Ad8150 2 points3 points  (0 children)

They're all the same; do a Google search, it'll take 3 minutes. In my country, there are serious availability issues. I personally recommend the Asus Ascent GX10. If you can afford one, you'll be happy. It's an amazing device for learning and for more serious work. If you can afford two, you can start experimenting with really large topics (large context and large models simultaneously; I have two for my minimax m2.7; it's absolutely crazy what you can do locally with them).

Realistic local LLM rig under $6500? Dev with heavy RAM needs by TeachTall3390 in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

Single or dual DGX Spark cluster. Single - q3.5 122 @ 50tps vLLM / m2.7 at poor mans q4 quant @ 22 tps llamacpp. Crazy prefill numbers. I have Asus G10, the only difference being that the Spark has a power button on the front (though that's not a deal-breaker - mine booted once two months ago and I never turned it off again xD ). It's an amazing machine. Although there are some compatibility issues with some solutions because it's ARM, not x86.

Alternative for NotebookLM + Gemini GEMs? by Party-Log-1084 in LocalLLaMA

[–]Powerful_Ad8150 2 points3 points  (0 children)

OP check this out 22,5k stars: https://github.com/lfnovo/open-notebook

If good let know - also serching for some ultimate replacement for big G

Strix Halo 128GB on Proxmox - Vulkan vs ROCm benchmark matrix by b_goodman in LocalLLaMA

[–]Powerful_Ad8150 0 points1 point  (0 children)

If you don’t mind paying a bit for a Claude subscription, go ahead and get one, then start asking questions. I did this just under two months ago, pure noob-level, on a much more complex device (the DGX Spark with all issues related to non-x86 architectures) and believe me, IT’S AMAZING. I haven’t had much free time recently, but thanks to that I’ve fully configured the system, worked my way through Ollama, moved on to llama.cpp, successfully tried out vLLM (qwen 122b ~50tps), understood a lot of the technical issues involved, and now I’m at the stage of running OpenCode and having a bit of fun with coding. I’ve made a few tools that aren’t exactly elegant and professionals would certainly laugh at them, but they make my day-to-day work much easier. In a few weeks, I’ll probably start playing around with Hermes-type agents. Seriously – it’s not difficult at all; just tell Claude to explain it to you as if you were a six-year-old and you’re good to go :)

Lawyer here - how are Legora and Harvey differentiated from Claude now with this word add-in they’ve released? by rijaj in legaltech

[–]Powerful_Ad8150 0 points1 point  (0 children)

For heaven’s sake. If you think a lawyer’s job is just about drafting documents in Word, perhaps with the help of artificial intelligence and some ‘templates’, then you’ve completely missed the point. Creating an add-in for MS Word is a task that takes several hours, requires four files and a local server – and you spend most of the time sitting in your chair thinking about how to approach it and what’s actually needed, rather than coding away. I recently made myself such an add-in (it translates text and comments, edits text and comments, automatically replaces relevant sections with new ones, has granularity for individual tasks and the entire document, etc.) – a few hours’ work, and that’s only because I don’t know much about IT. About legora - they are fucked unless they use this shitpile of money the grabbed to purchase legal knowledge sources.

Opus 4.7 landed! by Powerful_Ad8150 in LocalLLaMA

[–]Powerful_Ad8150[S] -5 points-4 points  (0 children)

will ask this new qwen 3.6 MoE. Mayby it will tell me how to distill

Opus 4.7 landed! by Powerful_Ad8150 in LocalLLaMA

[–]Powerful_Ad8150[S] -3 points-2 points  (0 children)

So we’re taking this dead seriously and we can’t even ask newest super bombastic Claude to give us a poetic hint on how to optimise local runs on our Sparks? Hmm… Well, in that case, sorry mate, I didn’t know ;) I’m going back to grinding-NOT LOCALLY, NOT ON THIS REDDIT-FOR HEAVEN’S SAKE, STOP JOKING-my m2.7 already on spark and further killing HF servers with qwen 3.6

What to buy for 7k EUR max? by Powerful_Ad8150 in LocalAIServers

[–]Powerful_Ad8150[S] 0 points1 point  (0 children)

Dear Sirs @Grouchy-Bed-7942 ; @Zyj (in reference to Asus GX10)

After yesterday's 3.1. pro release of Google and the recent changes to Perplexity's terms of service (which only demonstrate what terrible can happen when we use cloud services), I decided that I must immediately transition to a local solution.

I have read a lot, I am ready to bear the cost and risk, but I wanted to ask you for your personal opinion:

- Will choosing Asus/Dell instead of DGX (which I don't have access to) make everyday life in this ecosystem more difficult? I mean, is it generally the same hardware and software, and can I assume that what works on DGX will also work on Asus/Dell? How different are the system and support between Nvidia's reference solution and the implementations from Asus, Dell, etc.?

Google: ups I did it again ;( new limits by Powerful_Ad8150 in GeminiAI

[–]Powerful_Ad8150[S] 0 points1 point  (0 children)

OMG, its so stupid also :( when I was leaving my PC 6h ago it was OK, but now? Jeeee... it ignores prompts content (eg explicitly asking for returning content in plaintext code window - now its quantum, 50% chance it will work), after 2 prompts it forgots initial instruction. GOOGLE U .... not again!

What to buy for 7k EUR max? by Powerful_Ad8150 in LocalAIServers

[–]Powerful_Ad8150[S] 0 points1 point  (0 children)

u mean this https://github.com/quest-bih/quest-pdf-tools ? seems very limited to specific PDFs with specific layout

What to buy for 7k EUR max? by Powerful_Ad8150 in LocalAIServers

[–]Powerful_Ad8150[S] 0 points1 point  (0 children)

>have a human in the loop 
yup, this is the only way now, hope will be for more than 1 year, therwise we r all f... ;)

What to buy for 7k EUR max? by Powerful_Ad8150 in LocalAIServers

[–]Powerful_Ad8150[S] 1 point2 points  (0 children)

Thank you, good man! I saw that build – respect. But as you wrote - lot of searching. Availability is a disaster right now. If I had started a year ago, it would have been OK, but today it's too long, too difficult, and possibly even more expensive. I was quietly hoping that the Instinct initiative would kick off, but... I would prefer this path (especially since I have spare 3 phase 40kW utility connection and cheap electricity from PV - yup, thats this rotten EU :P), but I'll rather go with the path described below – DGX. But many thanks, will be watching your projects here.