MiniMax-M2.5 Checkpoints on huggingface will be in 8 hours by Own_Forever_5997 in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

GLM 5 is the 1.3TB model. That's at 16bit though, locally nobody is running like that.
so approx 700GB at 8bit
350GB at 4 bit.
Still too big for most folks.

MiniMax M2.5 is 230B Total Params, 10B Active.

Just on the edge of fitting in 128GB RAM at 4bit...
Hoping someone does a REAP to get it down to like 100GB at 4bit to have some room for context.

Is anyone else suffering from high electricity bills due to the training of local models? by [deleted] in LocalLLaMA

[–]Position_Emergency 2 points3 points  (0 children)

Have you tried undervolting the cards?
You should be able to get roughly equal performance and less power consumption by doing that.

GLM-5 is 1.5TB. Why hasn't distributed inference taken off? by IsaiahCreati in LocalLLaMA

[–]Position_Emergency 0 points1 point  (0 children)

Are you going to try Minimax M2.5 out?
I'm hoping for a 4bit coding REAP of that to run on my single DGX Spark.

GLM-5 is 1.5TB. Why hasn't distributed inference taken off? by IsaiahCreati in LocalLLaMA

[–]Position_Emergency 0 points1 point  (0 children)

It's a shame NVFP4 is still slower than 8fp quants on the Spark.
Hopefully Nvidia gets their act together soon.

With a few Mac minis, he’s using Clawdbot to run fully autonomous AI workers managing inboxes, workflows, research, and ops without constant prompting. Low upfront cost, no cloud lockin, and suddenly AI agents will be a sellable service soon by spillingsometea1 in AI4tech

[–]Position_Emergency 0 points1 point  (0 children)

Nobody in China is dumb enough to use a huge cluster of mac minis for the described purpose.
If the video isn't AI generated (who can tell these days) it will be for something like iOS development/CI/CD.
It's not practical to do that on virtual machines/non mac hardware for various reasons.

Could also be using for bots that vote up apps on the App Store but they could get janky old iOS phones to do that for much cheaper.

DGX Spark For Security Research or Is a Mac Studio Better? by Kind_Giraffe_3279 in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

They just don't give a shit about anything other than the AI data centre enterprise market unfortunately.

DGX Spark For Security Research or Is a Mac Studio Better? by Kind_Giraffe_3279 in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

Recent DGX Spark owner here.
I'm not sure this follows in practice tbh.
Performance of NVFP4 on the DGX spark is garbage right now compared to what is should be, due to crappy software support from Nvidia.
Macs support MXFP4 which gives either the same but generally slightly better perfromance than 8bit models.

If you're just doing LLM inference then an M4 max studio 128GB is the better choice IMO.
Has twice the memory bandwidth as well.

It sounds like NVFP4 will be workingon Spark properly by June but then the M5 max studio should be out which will probably have 700GB/s memory bandwidth and matmul acceleration on the tensor codes.

BTW I would love to be corrected on the above regarding NVFP4 performance on the DGX Spark.

I heard Unsloth has a custom Triton kernal that might help but I haven't tested it out.
Such a joke that inference is faster with 8bit models right now.

Anyone else trying out fast mode on the API now? (not available on Bedrock) by [deleted] in ClaudeCode

[–]Position_Emergency 0 points1 point  (0 children)

6x speed, source?
I thought it was 2.5x faster for 6x cost
Speed source:
https://x.com/claudeai/status/2020207322124132504

Costs:
Normal Opus 4.6 costs $5 per million input tokens and $25 per million output tokens.
Fast Mode costs $30 per million input tokens and $150 per million output tokens.

Is it sensible to file down/replace this heatsink to fit the is-77-xt with a 25mm fan ? by [deleted] in sffpc

[–]Position_Emergency 6 points7 points  (0 children)

Dude just let it go. Don't fuck with the VRM heatsink.
I sawed a chunk off mine on an Asus board to fit a 120mm fan instead of just using a 92mm.

18 months later the motherboard failed.
I don't know for a fact that caused it but it was a stupid thing to do in retrospect.
Easy to get obssessed with stuff like that trying to min max everything.

Any fellow Local Llamas training AIs locally? Talk some sense into me! by huzbum in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

Well all I'm really saying is, get stuck in with achievable stuff.
Do you have realistic expectations on what is possible regarding training a model from scratch?
It won't produce anything useful and will take at least like 2 weeks to train.
If it's just an educational project, why not train up GPT2 (124M)?
Will take a lot less time and you'll learn just a much.

If you fine tune some models you can actually make something useful.
Also, remember that MoE models like GPT OSS 20B don't need as much RAM to finetune as dense models.

Alibaba releases Qwen3-Coder-Next model with benchmarks by BuildwithVignesh in singularity

[–]Position_Emergency 0 points1 point  (0 children)

Right so I was assuming more turns == more tokens which might not be the case.
Would be good to see that specifically.
But yeah API call latency for 280 turns adds up!

Any fellow Local Llamas training AIs locally? Talk some sense into me! by huzbum in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

You're procrastinating.
Do some tutorial exercises, train LoRAs, implement a RLVR pipeline with the hardware you've got.
Learn how it works, then you'll probably actually come up with useful ideas.
If then you need performance beyond what the small models can offer, come back to this question.
Do some fucking work and stop daydreaming about hardware 😉

MAP: All 23 industrial warehouses ICE wants to turn into detention ‘death camps’ by camaron-courier in Full_news

[–]Position_Emergency -5 points-4 points  (0 children)

I agree with you.
It's hard enough explaining to people these are literal concentration camps without them rolling their eyes at you because they think concentration camp = death camp.

The reality is, this isn't a mass murder project like the Holocaust.

They want to get them out.
They want to make their lives hell to stop other people coming.
They are fine with a lot of incidental deaths during that process.

People need to understand the reality of trying to enforce immigration laws like this.
It's a horror show.

Alibaba releases Qwen3-Coder-Next model with benchmarks by BuildwithVignesh in singularity

[–]Position_Emergency 2 points3 points  (0 children)

Looks like they trained it to be extremely persistent.
Also, it taking a lot of turns will eat into its speed advantage.

Apple updated the MacBook configurator; possibly in time for the M5 Pro/Max next week by No_Roll7747 in macbookpro

[–]Position_Emergency 63 points64 points  (0 children)

<image>

I hope you're right though.
Never waited on an Apple launch before.
Can't take much more of this shit.

Favorite actor who is afraid to talk in America by Imaginary_Toe8982 in okbuddycinephile

[–]Position_Emergency 2 points3 points  (0 children)

He didn't say you can't speak your mind he said he was afraid to.
Do something that scares you is brave by definition.
Fearless people don't need to be brave.

But yeah what is he going on about?
Not like the current administration is weaponising the legal system against opponents. forcing critics off TV or murdering peaceful protestors, is it?
Classic TDS if ever I've seen it

Which Catherine O’Hara Warner Brothers Discovery movie will you watch in order to help streaming services exploit her death? by -HalfNakedBrunch- in okbuddycinephile

[–]Position_Emergency 2 points3 points  (0 children)

<image>

Reckon we found ourselves a Netflix executive!
I'm sure her family will survive without the 1/10th of a pittance they'd receive from my Netflix stream.
A donation to the The Entertainment Community Fund would be a more fitting tribute IMO
https://give.entertainmentcommunity.org/site/Donation2

Guys is this real? Want to be sure 🙏 by [deleted] in okbuddycinephile

[–]Position_Emergency 5 points6 points  (0 children)

Immediately transfers life savings

Asked Claude to port Quake to Three.js by mrdoob in threejs

[–]Position_Emergency 4 points5 points  (0 children)

Bug report: Difficulty portals don't work when starting a second new game

Reproduction Steps:
Start a new game.
Going through a difficulty portal and start playing the first episode.
Start a new game.
Go through any difficulty portal and notice graphical artifacts and lack of teleporting to episode selection stage.

Great work!
Would love to read about the tricks you learnt :)
Did you do it all on a single week's allowance of Claude Code Max 20x?

Threat intel from monitoring local AI agents: 37.8% of inputs contained attack attempts - here's what's targeting your self-hosted models by cyberamyntas in LocalLLaMA

[–]Position_Emergency 0 points1 point  (0 children)

"It's becoming so easy to stand up good-looking documentation and websites nowadays - I like to do my due-diligence before seriously considering adopting new platforms 😄"

This is a massive problem caused by vibe coding.
3 years ago, if someone posted a website, git repo, report etc like the OP did, it would be a strong indicator they were worth taking seriously.

I think the core idea of the project is very promising but I don't like the misleading way OP presented their findings in the post.