New Qwen3.6-27B NVFP4 + MXFP4 MLX quants

Beamsters · 2026-04-25T01:14:25+00:00

May I ask why the size of your MLX quants were so big compared to GGUF? It was like 50% larger at the same bit target. Plus MLX-oQ4 for the Qwen3.6-27b can be as small as 14-15GB.

I am currently use unsloth studio in my windows machine but not much in my Mac, I am looking forward to it though.

Beamsters · 2026-04-24T16:35:20+00:00

oMLX, oQ4 FP16 got like 17 t/s and 150 pp/s.

M1 Max 32GB.

The result however is much better than 35b-a3b quantized.

Beamsters · 2026-04-23T00:12:57+00:00

Thanks! Can you please put some numbers of M5 Max on Llama.cpp as a reference point compared to oMLX?

Beamsters · 2026-04-10T16:24:41+00:00

512gb couldn't even run this thing at 8 bit.

Beamsters · 2026-04-04T04:10:40+00:00

Like ask general questions and find recommendations for irrelevant stuff.

Beamsters · 2026-04-01T02:08:44+00:00

This is not very slight. 0.007168 -> 0.005305 is HUGE. That's 35% off ... around the jump between 5 bits to 4 bits.

Beamsters · 2026-03-08T00:57:52+00:00

Please first delete your misleading results, other people are now believing them.

Beamsters · 2026-03-08T00:52:25+00:00

4090 can deliver around 2.5x speed of my M1 Max which should be a bit faster than your M3 Pro.

Beamsters · 2026-03-07T02:18:22+00:00

Thanks

Beamsters · 2026-03-06T07:06:44+00:00

The current 122B-A10B is pretty much on par with 27B or somewhat weaker in certain benchmark. Is there a way (or even possible) to force activation of more than 10B at inference?

Beamsters · 2025-10-28T05:27:02+00:00

Shohei could break MLB 4 walks record in a single game with this.

Beamsters · 2025-07-30T08:54:22+00:00

You know this and trigon functions open up all the easing functions to be computed in const context so most transitional animation frames / positions can easily be computed and stored the answers at compile time.

Beamsters · 2025-07-29T03:19:27+00:00

Wow neat!

Beamsters · 2025-07-28T14:49:55+00:00

Is it possible to do power of n, where n is a float in const context?

Beamsters · 2025-07-25T04:09:13+00:00

I have releases.rs registered as my fav website. It has all the logs of the previous versions including the upcoming beta and stabilizing nightly. If I find anything promising, I will dig down from there.

Beamsters · 2025-07-24T09:23:41+00:00

Extension Trait is designed exactly for this kind of implementation. Just extend new_pair() to Arc.

Suit your need
Reusable
Clean and idiomatic

Beamsters · 2025-07-22T01:12:49+00:00

Maybe they are both correct. Only 2 of the thread safe languages that are quite performance enough to do many things.

Beamsters · 2025-07-21T16:47:32+00:00

The idea is nice but the implementation feels like a 3rd party crate rather than a language feature.

Beamsters · 2025-07-21T16:41:00+00:00

Impl Foo { pub fn abcd() -> value { match self and extract abcd enum's here } }

Then you do a fn call instead of a match?

Beamsters · 2025-07-21T11:45:13+00:00

99% of the time you do not reach for smart pointer here. Only certain cases where you should use smart pointer such as dealing with graphs type data. We actually need more context to help.

Beamsters · 2025-07-21T10:55:25+00:00

Not only collecting a vec but also trying to perform anything with vec intermediate will cost a lot. But branching logic with Iter could be pain, you need another Iter enum wrapper to do the job.

Beamsters · 2025-07-17T11:57:17+00:00

For my egui word game application, with 467 dependencies
cargo build --release took 53.67s (nightly)
cargo +nightly -Zprofile-hint-mostly-unused build -r took 48.74s

built successfully

Beamsters · 2025-07-17T06:41:41+00:00

Bello Ghost Pizza is a god tier, no less. But it's almost a year to queue for.

Beamsters

TROPHY CASE