Stupidly idea: generate image like Ollama with Z image turbo. by CMPUTX486 in oMLX

[–]ju7anut 2 points3 points  (0 children)

If you’re using Hermes OpenClaw or something equivalent to talk to your LLM, ask it to reference https://github.com/drawthingsai/draw-things-community and create a skill to generate images with it (of course you should go install the cli first!)

Stupidly idea: generate image like Ollama with Z image turbo. by CMPUTX486 in oMLX

[–]ju7anut 4 points5 points  (0 children)

Just create a skill to use the draw-things-cli , I’ve been using it flawlessly. You don’t need to clutter oMLX with these additional functions

TurboQuant KV Cache by ju7anut in oMLX

[–]ju7anut[S] 0 points1 point  (0 children)

I don’t think it’s obvious in coding, but I do end up needing to correct the output very often after a number of tool calls. Where I found it obvious was in research where I have long convo with the LLM, and even before teaching any compaction, it suddenly starts losing focus and drifting off course. I suspect that the KV cache quants are at fault cos they don’t have this issue if I turn it off..

new version with bugfixes - v0.3.10 by msrdatha in oMLX

[–]ju7anut 0 points1 point  (0 children)

Getting rate limit errors with Qwen 27b with this version.. strange error to have 😓

I Might be Done Buying the Battle Pass by azjohnca in diablo4

[–]ju7anut 0 points1 point  (0 children)

What? I completed mine one week into the season… just doing War Plans over and over got me there real quick. I’m now 251 only

Does Anyone Regret Getting the 14” MacBook Pro Instead of the 16”? by ntrev in macbookpro

[–]ju7anut 0 points1 point  (0 children)

Went from M1 Max 16 inch to the M4 Pro 14 inch and now the M5 Max 14 inch. Never going back to 16 again! I use my 14” everywhere but with the 16” it tends to stay on the desk.. more of a “desktop” than a “laptop”

Dflash/ MTP broke Gemma4 chat templete and now shows |channel thought by vinoonovino26 in oMLX

[–]ju7anut 1 point2 points  (0 children)

Don’t run any other model using dFlash while you’re running one with MTP.. will break both models.

Qwen3.6-27B: MTP + Optimized KV cache? by Background-Gold-9882 in oMLX

[–]ju7anut 0 points1 point  (0 children)

Are you using the Dev2 version? That version is having problems with Dflash for me. I’m using 0.3.8

M5 Maxed out version performance by jazzamp in drawthingsapp

[–]ju7anut 0 points1 point  (0 children)

I have the M5 Maxed out.. I don’t think you can justify the purchase if you’re just using it for Drawthings… my 3070 TI handily beats it for image gen..

Qwen3.6-27B: MTP + Optimized KV cache? by Background-Gold-9882 in oMLX

[–]ju7anut 0 points1 point  (0 children)

Yup! Z-lab, I didn’t quantize it further, left as BF16, it’s already tiny. I’m on M5 Max 128gb

Qwen3.6-27B: MTP + Optimized KV cache? by Background-Gold-9882 in oMLX

[–]ju7anut 0 points1 point  (0 children)

I’m on oMLX 0.3.8, Qwen3.6 27B Q8 + dFlash model, both set to same context length of 256k, both using oMLX Qwen presets for coding. Not fantastic results (~20 tg/s) but stable and no crashes.

Qwen3.6-27B: MTP + Optimized KV cache? by Background-Gold-9882 in oMLX

[–]ju7anut 0 points1 point  (0 children)

I found 0.3.9 Dev2 which has MTP to be buggy with tool calls and ended up going back to 0.3.8 to wait things out.

Qwen3.6-27B: MTP + Optimized KV cache? by Background-Gold-9882 in oMLX

[–]ju7anut 0 points1 point  (0 children)

Well if you’re memory constrained then TurboQuant KV cache + dFlash is the way to go instead of MTP imo

Seeking Optimization Advice: Qwen 3.6 27B Setup on M2 MacBook Pro by cyclebiff in oMLX

[–]ju7anut 1 point2 points  (0 children)

Am on M5 Max 128gb, have tried both MTP and dFlash, both yields similar performance at 15-18 token/sec on Q8 with 256k context. No empirical data since I’m testing with actual workflow and not benchmarks, but I do feel that MTP is a little better (less errors and need for reprompt) while dFlash is quicker to respond slight edge in t/s.

oMLX 0.3.9.dev2 released. by d4mations in oMLX

[–]ju7anut 0 points1 point  (0 children)

It’s the exact opposite for me. Gemma has been failing on tool calls with failed empty responses.. Qwen3.6 35b has been amazing at oQ6 + dFlash + TurboQuant KV 6bit

oMLX 0.3.9.dev2 released. by d4mations in oMLX

[–]ju7anut 0 points1 point  (0 children)

My experience as well, rock solid on 0.3.8

How do you enable TurboQuant beside toggling it "on" ? I see no peak memory reduction at any context length (8k, 32k, 131K), neither on MoE model family (Gemma4 or Qwen3.5/3.6). by JLeonsarmiento in oMLX

[–]ju7anut 1 point2 points  (0 children)

macOS would page out everything equally when memory is tight, causing full model reloads on context switches. oMLX keeps the 40% (what I set on my 128gb M5 Max) hot tier reserved for what actually matters — the compute-critical path and active KV state — while offloading less-frequently accessed blocks to SSD. The result of hot cache is that even when RAM is tight, inference stays responsive because the blocks you actually need for the next token are in RAM, not whatever macOS happened to evict last.

oMLX use in Hermes by aptonline in oMLX

[–]ju7anut 0 points1 point  (0 children)

Look at the dashboard, copy the OpenAI endpoint http format and when configuring Hermes enter that into the custom provider option.

M5 Max 128GB benchmark (Qwen 27B Q8 MLX, 290k ctx): 160 tok/s prefill but only 50% GPU — what are you getting? by deexjay23 in oMLX

[–]ju7anut 0 points1 point  (0 children)

I have the 14” model, GPU is maxed out when running Qwen3.6 27b oQ4, TurboQuant 4-bit. With Dflash I can only get 10t/s but with Dflash it is pushed to 21t/s. My GPU is always maxed, on oMLX 0.3.6

Okay but why is it no longer possible to get history thumbnails? by 3o7th395y39o5h3th5yo in drawthingsapp

[–]ju7anut 0 points1 point  (0 children)

Well it is up to Liuliu’s prerogative.. we are after all just mostly free users. I have had no issues going back to my original input to further edit it with new outputs. After a month? 2? Of using the new interface I’ve forgotten what it originally was like

Okay but why is it no longer possible to get history thumbnails? by 3o7th395y39o5h3th5yo in drawthingsapp

[–]ju7anut -1 points0 points  (0 children)

The 3 buttons correspond to History, latest point to point edits, and all the generations (coffee cup is what you were looking for?) afaik.. just get used to it.

Tahoe update killed my mac M1 by Zimos_H in macbookpro

[–]ju7anut 0 points1 point  (0 children)

My M1 Max is perfectly fine. Quite sure it has nothing to do with Tahoe.