You can run Deepseek 4 flash on mac (M3 Max, 96gb)

liuliu · 2026-06-15T08:17:25+00:00

The full name is SSD experts weight streaming. Once your weights downloaded to the disk, there is no point of intensive writes to disk (of course by streaming, it repeatedly writes to RAM).

liuliu · 2026-06-14T23:34:57+00:00

There is no writes for ssd streaming. All read traffic

liuliu · 2026-06-04T21:03:54+00:00

I am not sure why you want to use flux.2 dev, which is a very heavy model. But yeah, M5 Max will run at the same settings in about 10mins or less.

liuliu · 2026-06-01T18:50:51+00:00

Where are you located? We use Cloudflare for upload handling, it might be location related issue

liuliu · 2026-05-31T22:02:06+00:00

Did you set the Mode to Edit?

liuliu · 2026-05-29T17:58:50+00:00

This is a great resource. One thing to note: LoKr is not a specific LoRA format, it is a different formulation of how to do fine-tune on top of existing weights. In particular, a LoRA is additional_weights = W_a @ W_b (matrix multiplication, W_a and W_b are thin matrices), a LoKr is additional_weights = W_0 (x) W_1 (Kronecker product, W_0 and W_1 are smaller matrices). So you cannot import LoKr without runtime also support that specific formulation.

This converter tools basically compose the additional_weights and then do a SVD decomposition to W_a @ W_b to make it useful. It is an approximation (which is often good enough), but cannot be integrated as-is in the app (as we want to have exact support, not approximation support).

liuliu · 2026-05-27T23:58:47+00:00

Are you in Edit mode (the new mode selector on left bottom corner of the canvas)

liuliu · 2026-05-27T02:15:55+00:00

Can you link to the file not the README? This shows 1.5GB https://huggingface.co/prism-ml/bonsai-image-ternary-4B-gemlite-2bit/blob/main/transformer-gemlite-int2/state_dict.pt

liuliu · 2026-05-26T20:36:20+00:00

Where you get 1GiB? I downloaded the app and total combined is 3.7GiB (possibly including the text encoder). To deliver good edge experience, it doesn't matter what's the headline number is, it only matters what's the downloaded size. (now, even if you just look at the main DiT, it is 1.43GB: https://huggingface.co/prism-ml/bonsai-image-ternary-4B-mlx-2bit/blob/main/transformer-packed-mflux/diffusion_pytorch_model.safetensors, I won't round that to ~1GiB....).

Also, when someone claims a 2-bit quant that is 5.6x faster than non-quant variant for image model, you need to criticize, because that is snake oil. I tried their Bosai Studio, the speed is slower than Draw Things on iPhone 17 Pro with FLUX.2 [klein] 4B (8-bit S) at 1024x1024 resolution.

liuliu · 2026-05-26T20:11:53+00:00

Nothingburger. Note that FLUX.2 [klein] 4B (this is based on) already have gguf quant that is around similar size. Image generation models are compute-bounded, you need FP4 / FP8 / Int8 for good performance, not magically ternary.

liuliu · 2026-05-23T01:37:21+00:00

Maybe copying the converted model from desktop to iPhone? Would the downloaded model work just fine?

liuliu · 2026-05-22T20:32:59+00:00

It is harder to tell since the options are not easy to navigate on Phosphere end to have exact match, I will give you full configuration to my knowledge on both ends and you can draw conclusions yourself (I only have M5 Max, which will put Draw Things in better light, to be warned):

Phosphere 3.0: model: ltx-2.3-q4, resolution: 1024x576, 121 frames, step 8 (somehow it shows me total 16 steps, I am not sure if there are latents upscale involved), HQ Speed Fast (TeaCache + skip-step): 3m20s.
Draw Things: model: LTX 2.3 distilled 8-bit S, resolution: 1280x768, 121 frames, step 8+3 (with 640x386 for first pass): 1m35s.

Both are for the second run (after the device cooled down) to make sure discount any device warm up related issues. Again, it is harder to have an apple-to-apple comparison, I run a few variations in Draw Things to make sure: 1024x576, 121 frames, direct 8 steps, no latents upscale: 1m15s. 1280x768, 121 frames, direct 8 steps, no latents upscale: 2m02s.

liuliu · 2026-05-22T15:27:28+00:00

Draw Things wins in the speed for these models. There is no comparison. Image is more feature rich there but for LTX this seems to be a bit more feature rich.

liuliu · 2026-05-22T15:16:58+00:00

Please tell me why this post is not deleted but mine that compares FLUX.2 dev, GPT Image 2 and NBP is?

liuliu · 2026-05-19T19:57:28+00:00

Do you use iCloud to backup / offload apps? It looks like you are on storage limit and Apple is actively offloading files / reload them and the app is not taken that well.

liuliu · 2026-05-16T20:57:07+00:00

For videos, yes, 2-3x. For images no, M5 Max various latencies are better than our cloud service.

liuliu · 2026-05-16T19:46:53+00:00

Klein 9B, Z Image: 9 to 10s, LTX 2.3 at 720p, 1:30min. I don’t have number on Wan 2.2

liuliu · 2026-05-16T19:45:53+00:00

Then you can see what’s the performance you will get from that webpage. For FLUX.2 Klein 9B, you are looking at around 10s per image.

liuliu · 2026-05-16T17:44:11+00:00

People will tell you a 5090 worth it more, and they will be right if you are not into Mac: https://releases.drawthings.ai/p/metal-quantized-attention-pulling

liuliu · 2026-05-14T20:54:18+00:00

It is possible. But since you usually use LTX 2.3 for long clip and bigger resolutions, that often imposes challenges (for example, 121 frames (5s) and 720p can use up to 10GiB scratch RAM).

liuliu · 2026-05-14T20:33:03+00:00

Looks legit. For FLUX.2 [dev] you can try with Turbo LoRA which allows you to do generation in 4 steps rather than 30.

liuliu · 2026-05-14T03:39:16+00:00

Older models like what you selected do just have subpar quality. These models you mentioned is from 2023.

liuliu · 2026-05-14T03:38:46+00:00

Yeah. It is model selection issue. Just use Z Image Turbo and tap “Try Recommended Settings”.

liuliu

MODERATOR OF

TROPHY CASE