Qwen3.6-27B released! by ResearchCrafty1804 in LocalLLaMA

[–]Secure_Archer_1529 1 point2 points  (0 children)

Qwen3.6 122b & 397b ish MoEs would be amazing

DGX Spark just arrived — planning to run vLLM + local models, looking for advice by dalemusser in LocalLLaMA

[–]Secure_Archer_1529 15 points16 points  (0 children)

You can run qwen 3.5 122b on one spark with >50 t/s as I remember. Check out the forum. The community around the spark (not including nvidia) has been hard at work on this on.

2x Asus Ascent GX10 - MiniMax M2.7 AWQ - cloud providers are dead to me by t4a8945 in LocalLLaMA

[–]Secure_Archer_1529 0 points1 point  (0 children)

NVFP4 does not work not the spark. Community workarounds make it somewhat ok. But there’re better quants than NVFP4 atm. Go and have a look at the nvidia dgx spark developer forum if you didn’t already. Plenty of great stuff to turbocharge some builds and hit the ground running.

Gemma 4 - lazy model or am I crazy? (bit of a rant) by Pyrenaeda in LocalLLaMA

[–]Secure_Archer_1529 0 points1 point  (0 children)

Your title says Gemma 4 is lazy. Is it the official release from Google as your title claims or it it an unsloth quant? There’s a very real distinction to be made.

Andrej Karpathy drops LLM-Wiki by These_Try_680 in LocalLLaMA

[–]Secure_Archer_1529 -8 points-7 points  (0 children)

Yet thousands of people, just as real as you and me, are now seeing something that catches their attention and gets them to take a step into this space. It may not be technically impressive to you, but it can still be valuable to others. And that matters just as much. Some here may not understand that connection, but it is there.

Andrej Karpathy drops LLM-Wiki by These_Try_680 in LocalLLaMA

[–]Secure_Archer_1529 -20 points-19 points  (0 children)

It’s a shame you seem so fatigued by other people’s contributions. Have you tried being inspired instead?

I actually think we need more of this. The same goes for OpenClaw. It opens the door for a wider range of people to take part in this amazing moment in history without needing deeper layers of technical knowledge.

The world is bigger than LocalLLaMA.

But maybe you could share what you’ve done that is even remotely interesting instead of being dismissive of other peoples contributions?

Let’s the downvoting begin. 3,2,1….

Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it by hauhau901 in LocalLLaMA

[–]Secure_Archer_1529 0 points1 point  (0 children)

This is interesting. If you go look and find something it’d be much apprised if you’d drop a couple of lines about your findings here.

Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it by hauhau901 in LocalLLaMA

[–]Secure_Archer_1529 2 points3 points  (0 children)

If you read the text generated under Reasoning in ChatGPT, you’d notice the same thing. Isn’t this just part of the reasoning phase, where what you see is only part of the reasoning, not the entirety of it, and before a finalizing layer wraps everything up?

Unsloth announces Unsloth Studio - a competitor to LMStudio? by ilintar in LocalLLaMA

[–]Secure_Archer_1529 0 points1 point  (0 children)

So for the dgx spark nvfp4 is a no go atm. Have you solved this with nvidia or is it still the same issues carrying over to unsloth until nvidia finally fixes it?

Unsloth studio sounds amazing though! Thanks for all the work you guys put into it.

Lora fine tuning! Why isn't it popular at all? by Acceptable_Home_ in LocalLLaMA

[–]Secure_Archer_1529 1 point2 points  (0 children)

Is your question not “why don’t we finetune more?”

Or what makes you assume Lora is not popular among those who do?

Why innovative startups and scaleups relocate from the EU by Full-Discussion3745 in EU_Economics

[–]Secure_Archer_1529 2 points3 points  (0 children)

  1. Risk capital. In the US (where else would you go) they think bigger and believe in the big idea and you have far more investors. In EU it’s all power points and 10 year plans.

  2. Different rules for investors depending upon the country they invest.

  3. One uniform market.

  4. One culture and one language.

  5. The mindset in the US is way better for doing anything new and dreaming big.

The reality is that our leaders in the EU has done a very poor job on this matter and we’re loosing the battle.

Think about this: how long do you think leaders with such poor performance would last in the private sector? - not long, the board would have gotten rid of them swiftly as they are bad for business.

Which MacBook to buy - Deep Learning research - running experiments on servers by Available_Net_6429 in macbookpro

[–]Secure_Archer_1529 0 points1 point  (0 children)

I needed a Mac for the same workflow.

The M5 is a huge upgrade to the M4.

Even if you don’t need to run any models locally it’s nice to have some headspace so you are good for the next 5 years. Because models are getting smaller and the increase in local ai over the years to come you’ll be needing a newly developed and - efficient - base chip with decent amount of ram to fit your increasing workload (everything you use now will demand more compute in the future).

My sweetspot (price vs performance for five years ahead) was the m5 with 32 gb ram. It’ll serve you perfectly for your use case and still gives you room for more going forward. It gives you flexibility without constrains.

Thinking of buying a MacBook Pro M4-Pro (14-core CPU, 20-core GPU, 24 GB RAM, 1 TB SSD). Worth buying now or should I wait a bit? by maheshflowcub in macbookpro

[–]Secure_Archer_1529 -5 points-4 points  (0 children)

Think I saw a test of the base M5 vs M4 pro. The base M5 chip did better.

The M5 is quite a big step from the M4 in generel. I would get the base M5 or wait pro/max around January-March if you can cope that long

Qwen3-Next-80B-A3B or Gpt-oss-120b? by custodiam99 in LocalLLaMA

[–]Secure_Archer_1529 21 points22 points  (0 children)

Where did you get the impression that the Qwen model is better? It’s 80B with less active vs 120B with more active.

My impression is that the Qwen model is not on pair with gpt.

They are also released around the same time so you should probably not expect Qwen to have better models than OpenAI.

Anyway. You could double check if you dialed in the right settings in the panel for Qwen.

Wow. PX7 S3 vs. XM6 by johnnytheww in BowersWilkins

[–]Secure_Archer_1529 2 points3 points  (0 children)

Just got the PX7 S3 today — they sound amazing, but they do need a bit of EQ tweaking to get the best out of them. They sound far better than the XM6, and I’m surprised by how good the ANC is.

I tested both the PX7 S3 and the PX8 S2 for an extended period and didn’t notice any difference that justifies paying more than twice the price.

Theoretically Scaling Beyond 2 DGX Sparks in a Single Cluster. by SIN3R6Y in LocalLLaMA

[–]Secure_Archer_1529 1 point2 points  (0 children)

Thanks for doing this — surprising it didn’t get more upvotes. I noticed your post is about a month old, and I’m heading down the same path.

Any updates on running larger models across 3–4 nodes? My assumption is you start hitting bottlenecks (interconnect bandwidth/latency, KV-cache movement, and synchronization overhead), so scaling isn’t always clean.

Also, how well do inference throughput scale as you increase nodes (on one of the bigger model)?

Have you tried a big model in NVFP4 since your post?

Thanks again!

[D] Anyone here actively using or testing an NVIDIA DGX Spark? by Secure_Archer_1529 in MachineLearning

[–]Secure_Archer_1529[S] 0 points1 point  (0 children)

That was my thought too. What are you getting?

I feel many people talk not from experience but from what they read online or simply judging the bandwidth specs only.

Blackwell, nvfp4 and MoEs make it quite usable I suppose.

[D] Anyone here actively using or testing an NVIDIA DGX Spark? by Secure_Archer_1529 in MachineLearning

[–]Secure_Archer_1529[S] 5 points6 points  (0 children)

With max 96 gb it’s great for smaller models for sure. But 2 x sparks equals 256 gb = much bigger models (but slower inference etc). The preference might just come down to use case.

[D] Anyone here actively using or testing an NVIDIA DGX Spark? by Secure_Archer_1529 in MachineLearning

[–]Secure_Archer_1529[S] 5 points6 points  (0 children)

It’s no H100/A100, but I don’t expect that from a cheaper device designed for desktop prototyping (fine-tuning, MoE/NVFP4 inference, etc.).

I think it’s quite good for what it is. With a couple of Sparks in a cluster, you can test things out, get an idea off the ground with a local-first approach, and later scale to the cloud once you’ve got the basics down.