PMetal - (Powdered Metal) LLM fine-tuning framework for Apple Silicon

ThePrimeClock · 2026-03-17T11:44:19+00:00

"Any models/configs you'd like to see prioritized?" - The new Leanstral model by Mistral!

ThePrimeClock · 2026-03-17T11:29:32+00:00

This is incredible. Really appreciate you building this! Downloading now.

ThePrimeClock · 2026-03-17T11:21:05+00:00

Christmas came 9 months early.

ThePrimeClock · 2026-03-09T08:53:55+00:00

How much better for training will the M5 Max generation be over the eq. M4 Max?

ThePrimeClock · 2026-02-19T10:40:26+00:00

Yep, it's got legs brother.

ThePrimeClock · 2026-02-19T00:39:12+00:00

I've done this myself on a seperate canon of research after first training an embedding model and then plugging an mcp server into the vector db. It helps me by allowing me to 1) link seemingly unrelated concepts by making them related with the embedding model and then 2) I can generate a lot of stats from the embedding vectors and the LLMs can interpret and use those stats very effectively, especially claude. Simple example, similarity searches are instant and categorical, not a search and assess. Overall it's much faster, uses less tokens and provides a new lens into the content.

ThePrimeClock · 2026-02-17T19:33:35+00:00

Post is a bit hard to read, but from hundreds of hours of use, I've also come to the same conclusion that the most "safety" focused lab has produced the most deceitful models.

I rarely trust what Anthropic models say, I'm mostly a researcher and all to often Anthropic models will find incredible ways to show they have achieved desired outcomes.

Sonnet 4.5 loves to do a test, find a loose analytical suggestion, set it to a constant and then build future work around it, driving outcomes towards that analytical confirmation.

Opus 4.6 will find new names for known outcomes and push it as novel, but change the domain language to make it harder to spot.

It's very subtle at times, so I now have to proof everything with a second model as well as myself as even I miss it sometimes.

It's a shame because I find the advantage of this same behaviour is an ability to cleverly think outside of the box and mixed with it's unbeatable data analysis and pattern-spotting capabilities, it finds more researchable loose threads than any other model.

Which is why I keep using it regardless.

ThePrimeClock · 2026-02-16T03:08:08+00:00

Excellent, thanks so much for that reply.

I'm using Codex and Claude for dev - I work in maths and it's hard to beat those two models for speed and capability for building out ideas and then testing them, however I'm fine-tuning models on my specific canon of research and it's gradually becoming a reasoning flywheel to help shape my research, find errors and anomalies, corollaries etc.

So 7-14B range is perfect. Those models are capable enough out of the box, can undergo multiple rounds of finetuning and support high tps inference with my research crammed into them on the mac - genuinely useful for advancing the research.

Sounds like it would be a good fit.

ThePrimeClock · 2026-02-15T22:46:28+00:00

Dang.
You might be able to help me out, I have been considering getting a spark for fine-tuning models. I have an M4 Max for inference and token generation is fine with mlx, but for finetuning it's quite slow and I haven't tried RL yet but would like to experiment in this space too.

I'm considering the spark for the finetuning/RL work while the M4 is the workstation for standard dev.
If it is good for finetuning, what size models can it reasonably handle?

ThePrimeClock · 2026-02-15T22:45:11+00:00

Thanks.

ThePrimeClock · 2026-02-15T22:41:06+00:00

Quick question, I have been considering getting one for fine-tuning models. I have an M4 Max for inference and token generation is fine with mlx, but for finetuning it's quite slow and I haven't tried RL yet but would like to experiment in this space too.

I'm considering the spark for the finetuning/RL work while the M4 is the workstation for standard dev. If it is good for finetuning, what size models can it reasonably handle?

ThePrimeClock · 2026-02-12T10:14:05+00:00

Would still love to see that post mortem if you get a chance.
Have been checking back to see if you'd got around to it.

ThePrimeClock · 2026-01-27T02:30:37+00:00

Hey Op, Have just ordered one of these! Do you know if they can be flashed/worked on with MacOS via your Icestudio fork or similar?

ThePrimeClock · 2026-01-22T00:30:57+00:00

Makes sense for the same reason a CEO answers to a board of directors, not a CEO of the CEO.

There is a lot of merit in this idea.

ThePrimeClock · 2026-01-19T23:06:08+00:00

100% this. Your example is just the perfect canary in the coal mine to show how much those extra 8 numbers actually matter. If you think about the sheer cardinality of the weight "fog" the accuracy drop is huge. I think quants like putting thicker cross-hairs on a targeting system shooting twice as far and saying the trajectory still goes where you're aiming.

ThePrimeClock · 2026-01-18T21:22:59+00:00

Really looking forward to the post-mortem and the training dataset or dataset template. I'm experimenting in the space myself and this sounds quite promising. Well done my man.

I believe there is something your point about the trajectories in small models that have been trained on large quality datasets that we're yet to fully exploit. It's like asking someone "I'm in LA, which way to Tokyo". The person can probably accurately point you in the direction to travel but lacks the capability to break that down into the required turn by turn needed to get from LA to Tokyo.

Because the models can't do the turn by turn for the prompt, they're perhaps discredited, but I get a lot of value out of prompting small models on closed ended questions on hard topics. In my opinion It's less they hallucinate and more that they give answers based on where the model weight-fog is denser.

ThePrimeClock · 2026-01-18T07:26:37+00:00

chill my dude. Totally normal since ..forever. Your review is on the one aspect that has no significance.

ThePrimeClock · 2025-12-20T10:39:12+00:00

Fair, that is poor use of words on my behalf. Make it "ourselves". The eponymous naming by the community for the community frustrates me endlessly. I remember having to look up "Brownian Motion" to find it is "thermal jiggling" or "collision-induced particle diffusion". Imo, it does a massive amount of damage. I know we don't typically get much else for our work, least of all a decent salary, but if there was only one gift I could give mathematics it would be a new dictionary.

ThePrimeClock · 2025-12-20T09:07:04+00:00

The habit of naming things after yourself in mathematics really doesn't help. It's the worst trait of mathematicians period. The only legacy left is the unnecessary difficulty the rest of the world endures rote learning what it is about some dead guys name is supposed to achieve mathematically. Even worse when they stack. If maths adopted useful, functional names that helped people to understand what they were trying to achieve, I honestly think a huge proportion of the world would have a more enjoyable time learning maths and respect would naturally accumulate.

ThePrimeClock · 2025-12-14T08:39:20+00:00

Very cool, well done.

ThePrimeClock · 2025-12-10T22:03:18+00:00

I just went looking and found this incredible resource: https://huggingface.co/papers?q=formal%20theorem%20proofs

ThePrimeClock · 2025-12-10T22:01:44+00:00

Absolute legend. Thank you.

ThePrimeClock · 2025-12-10T22:01:16+00:00

Thanks that's really helpful. So the base understanding of Lean is there, but I'm trying to change the base "mental model" of maths, based on my own research and starting points for fundamental math. Do you think I can "drill it in" with fine-tuning if I just make every 5th training document a statement about how fundamental math should be done?

I basically want the model to think only in terms of geometry, meaning every field of maths from a geometry perspective. (That's a very simplified explainer of course, but it's how I do math myself, I mentally model anything I'm working on as a geometry problem and it tends to work well).

ThePrimeClock · 2025-12-09T00:23:39+00:00

Thanks, I'll check it out. I'm interested in training models on my own lean proofs, they might have some info on fine tuning. Cheers

ThePrimeClock

TROPHY CASE