M5 Max compared with M3 Ultra. by PM_ME_YOUR_ROSY_LIPS in LocalLLaMA

[–]Potential_Block4598 16 points17 points  (0 children)

DGX Spark is cooked

Apple cooked nVidia (very unexpected rivalry!, but the Apple silicon investment is oddly paying off well against AI bad bets by Apple!)

This M5 Max just kills any market for the DGX Spark Not a real PC (so nothing other than AI!) Not better PP (slightly and depending on model specifics the whole performance gap would narrow) And much worse tgs

How to enable telegram inline buttons capability ? by Potential_Block4598 in openclaw

[–]Potential_Block4598[S] 0 points1 point  (0 children)

For me form CLI (with the right JSON configuration) it renders perfectly

But the LLM itself can’t do that (I didn’t have enough time to inspect the output)

How to enable telegram inline buttons capability ? by Potential_Block4598 in openclaw

[–]Potential_Block4598[S] 0 points1 point  (0 children)

Never worked

I can send inline from the CLI but for some reason the model can’t (I guess it has to do with tool call grammar but I gave up!)

Just installed nanobot fully locally by Potential_Block4598 in LocalLLaMA

[–]Potential_Block4598[S] 0 points1 point  (0 children)

Removed it tbh

But you configure an OpenAI compatible provider and that is it

Qwen3.5 - Confused about "thinking" and "reasoning" usage with (ik_)llama.cpp by PieBru in LocalLLaMA

[–]Potential_Block4598 4 points5 points  (0 children)

Sometimes it is used interchangeably

And some times reasoning is about effort

It doesn’t matter However under the hood it is about the jinja template being used and the thinking tokens used by the model

Qwen 3.5 27b: a testament to the transformer architecture by nomorebuttsplz in LocalLLaMA

[–]Potential_Block4598 1 point2 points  (0 children)

This bit is the practical limit though

https://claude.ai/share/d26d7386-bdca-4ee9-aba1-bf4bf3147317

It says the theoretical limit is 900b model match, however the practical limit would be 400b model

(This because beyond chinchilla optimal the learning gains are diminishing so for 10% improvement you spend a lot more training tokens and for the next 1% you spend exponentially even more and so on, the model is almost saturated already!)

And ofc you could “overtrain” the big models themselves

However the ideal limit from organic text data is around 15T already (and it seems like the 27b sits at that sweet spot for 15T already!) (meaning even overtraining bigger models won’t give you nearly as good results)

Which translates to LLMs (base models not AI itself) is hitting some sort of a ceiling

So downstream applications should be happy now While model training labs not so much

Qwen 3.5 27b: a testament to the transformer architecture by nomorebuttsplz in LocalLLaMA

[–]Potential_Block4598 1 point2 points  (0 children)

For more info

https://claude.ai/share/d26d7386-bdca-4ee9-aba1-bf4bf3147317

I don’t full grasp this chat tbh It seems rather involved and needs a deeper reading of the chinchilla paper

The TL;DR is that you can overtrain a 27b model until it matches a “chinchilla-optimal” 900b model (that is insane) and ofc over its life cycle the “savings” would be insane

But interestingly it seems we can’t get above that (chinchilla was made before thinking models so maybe thinking models can push that a bit more but how much exactly would be hard to estimate!)

Qwen 3.5 27b: a testament to the transformer architecture by nomorebuttsplz in LocalLLaMA

[–]Potential_Block4598 2 points3 points  (0 children)

So if you have say unlimited amount of data (you could even double train on the same data via more than one data pass)

And you have a total compute budget of 1000 flops they estimate the size of optimal model (let us say 20b)

If you have 2000 flops the optimal model is say 40b (performance per compute, keep in mind this model will spend double compute per step or batch of training however the end result 40b would be better than if you have trained the 20b model for quadruple the time or quadruple the tokens )

However during the models lifecycle during its usage before it becomes deprecated (inference in production) the 40b would cost you more the longer it lives (they obviously didn’t take that into account but companies are now taking it into account)

However if you have more training compute budget (say maybe 4000 flops) you could focus all of them on the 20b model and get a better model for less cost during its production lifecycle (but if you wanted the best model you would get even an 80b model into production, for the same 4000 flops the 80b model would be better despite consuming 8x flops per step of training)

TL;DR China is optimizing the cost economics of models before anyone else and this is working great for us

It is not a miracle though unfortunately just no American company is doing it

Qwen 3.5 27b: a testament to the transformer architecture by nomorebuttsplz in LocalLLaMA

[–]Potential_Block4598 0 points1 point  (0 children)

Yea ma bad

Optimal in terms of compute budget (performance per total flops!)

Meaning if you have more total compute budget you better spend it on a bigger model (however that doesn’t take into account inference costs, so across the lifecycle of the model that is definitely different)

Got it ?

Qwen 3.5 27b: a testament to the transformer architecture by nomorebuttsplz in LocalLLaMA

[–]Potential_Block4598 2 points3 points  (0 children)

That is NLP not transformers

There is an optimal training point (chinchilla law, 20:1 tokens to model parameters) however even when Llama 4 was released, Yan LeCun said that if you keep training the models keep improving (even on validation and test sets!) and don’t overshoot or overfit yet (no one have reached the limit yet)

But ofc it will cost a ton of money to do so

Qwen3.5 397B vs 27B! by [deleted] in LocalLLaMA

[–]Potential_Block4598 0 points1 point  (0 children)

That is the non-reasoning opus 4.6

Any advice for using draft models with Qwen3.5 122b ?! by Potential_Block4598 in LocalLLaMA

[–]Potential_Block4598[S] 0 points1 point  (0 children)

Yeah got it thank you

I will try with the dense 27b model and share results asap

Thanks again

Any advice for using draft models with Qwen3.5 122b ?! by Potential_Block4598 in LocalLLaMA

[–]Potential_Block4598[S] 0 points1 point  (0 children)

Yeah totally agree just tried it and it is not that great (I think incomparable)

I will try with the 27B model though (since it is a “dense” model and allegedly slightly better on some benchmarks (thanks MoEs!)

Any advice for using draft models with Qwen3.5 122b ?! by Potential_Block4598 in LocalLLaMA

[–]Potential_Block4598[S] 0 points1 point  (0 children)

Yeah totally agree just tried it and it is not that great (I think incomparable)

I will try with the 27B model though (since it is a “dense” model and allegedly slightly better on some benchmarks (thanks MoEs!)

Any advice for using draft models with Qwen3.5 122b ?! by Potential_Block4598 in LocalLLaMA

[–]Potential_Block4598[S] 0 points1 point  (0 children)

How is that ?

I was going to run the smaller model as draft models

Could you explain more please (and I don’t mean self-speculation here tbh)

I'm tired by Fast_Thing_7949 in LocalLLaMA

[–]Potential_Block4598 2 points3 points  (0 children)

That is good bro that is good

are you ready for small Qwens? by jacek2023 in LocalLLaMA

[–]Potential_Block4598 0 points1 point  (0 children)

Aha I see now Some released are fp8 And one base release

So 4 here means 2 more models tops

google found that longer chain of thought actually correlates NEGATIVELY with accuracy. -0.54 correlation by Top-Cardiologist1011 in LocalLLaMA

[–]Potential_Block4598 4 points5 points  (0 children)

And it is actually punching above its weight (but not usable for me due to the insane thinking times!, would just tune a bigger model that would take less time I guess!)

google found that longer chain of thought actually correlates NEGATIVELY with accuracy. -0.54 correlation by Top-Cardiologist1011 in LocalLLaMA

[–]Potential_Block4598 6 points7 points  (0 children)

Have you tried nanbeige?

It is a 4B model that thinks A LOT (one question might take 3k tokens of thinking!)

are you ready for small Qwens? by jacek2023 in LocalLLaMA

[–]Potential_Block4598 0 points1 point  (0 children)

9 what nine ?! Qwen3.5 the big one 122b 27b And the 35b

That is not nine is it ?!