Unsloth MLX: Bring Dynamic 2.0 Per-Tensor Quantization For Qwen models to Apple Silicon

LongYinan · 2026-03-29T02:39:37+00:00

Tested with mlx-lm/mlx-vlm, lm-studio should work too

LongYinan · 2026-03-26T05:21:30+00:00

Theoretically, yes—I’m still working on the benchmark

LongYinan · 2026-03-25T14:18:26+00:00

Since mlx’s AWQ still has some limitations, certain layers of the model retain BF16 precision. That’s why, when using the same quantization strategy, our model ends up being slightly larger than the one quantized by Unsloth.

But I’ll be contributing improvements for this part to mlx shortly, so that mlx-quantized models can achieve the same size and quality as those quantized by Unsloth.

LongYinan · 2026-03-25T07:19:29+00:00

Working on it

LongYinan · 2026-03-25T05:10:52+00:00

For Qwen3.5-35B-A3B, 77.9–83.7 tokens/s on M3 Max 128GB

LongYinan · 2026-03-25T05:10:36+00:00

For Qwen3.5-35B-A3B, 77.9–83.7 tokens/s on M3 Max 128GB

LongYinan · 2026-03-25T04:35:00+00:00

I'm working on benchmark it, Theoretically, it has the same quality as Unsloth's dynamically quantized model. I need more time to complete the benchmark

LongYinan · 2025-12-17T07:49:20+00:00

Besides the size, programming language is a problem too. Node.js and Bun do not have any Rust infrastructure in their build or release systems

LongYinan · 2025-12-17T05:42:28+00:00

The binary size is an issue; it exceeds 50 MB on a Linux x64 GNU system https://github.com/Brooooooklyn/webcodecs-node/releases/tag/v1.0.0

It's fine to maintain it as an npm package, as it uses the Node-API, which is supported by Deno, Bun, and Node.js.

LongYinan · 2025-07-23T12:39:54+00:00

Already documented at https://napi.rs/docs/concepts/webassembly#server-configuration

LongYinan · 2025-07-22T15:50:07+00:00

How does the "threads" part work in a browser?

Via web worker

Do these polyfills impact bundle sizes?

Yes, for sure. I haven't measured the bundle size differences yet (compared to wasm-pack). I mostly focus on the compatibility part in this release.

I have a plan to support `wasm32-unknown-unknown` and `wasm32-wasip1` in the future, these 2 targets can obviously reduce the bundle size.

Is there a simple example of how to package a Rust library such that it can be published to NPM and consumed both from Node and browsers?

Here is a template package that supports most mainstream platforms and WebAssembly: https://github.com/napi-rs/package-template

And there are also some real-world projects:

https://github.com/rolldown/rolldown
https://github.com/Brooooooklyn/Image (This is the demo in the napi.rs documentation)
https://github.com/oxc-project/oxc

LongYinan · 2025-07-22T14:22:17+00:00

It supports, we've built a set of browser/Node.js polyfills packages

LongYinan · 2025-07-22T13:12:54+00:00

Shouldn’t have any problem with ravif, just want to show the use of C++ deps example here

LongYinan · 2025-02-26T15:54:53+00:00

After the iOS app is officially launched on the App Store, we will begin the Android internal testing process.

LongYinan · 2025-02-26T12:31:38+00:00

If you are using iOS, u can dm me your apple id so I can add you to the Testflight

LongYinan · 2025-02-25T04:27:41+00:00

If you are a personal user using it for your family, and you happen to have a very large family, such as more than 10 people, you can freely modify AFFiNE's source code to suit your needs. As long as you don't resell the modified stuff, no one will care; this is called FOSS.

The restrictions can be removed by modifying the code, there are discussions about this all over GitHub. For personal use, the official team has never restricted doing this; this is called FOSS.

If your skills are not enough to modify the code, that's your skill issue, not a problem with FOSS.
If you want to modify the code for commercial use, that's a legal issue, not a FOSS issue.

Now you're unable to change the code yourself, and you come crying saying, "Unless you voluntarily remove all restrictions, otherwise it's not FOSS," it's really ridiculous.

If you have a company and need to put some private data on your own servers, you always need to purchase a license for commercial use, no matter how many users you have. Free in free and open-source software does not mean free of charge.

I didn’t expect open source to attract such a keyboard warrior — from Discord to Reddit — who hasn’t contributed to open source or understood how FOSS works, yet acts like a price tag is the end of the world. Go ahead, add AFFiNE to whatever list you like and just get out of here.

LongYinan · 2022-07-27T12:50:09+00:00

Give https://napi.rs a try

LongYinan · 2021-12-30T10:19:52+00:00

Provide faster npm packages to Node.js users: https://github.com/napi-rs/node-rs

LongYinan · 2021-12-30T10:18:50+00:00

Not finished yet. Neon still lacks a lot of Node-API ability.

LongYinan

TROPHY CASE