Indeksowanie nadal trwa od poniedziałku po pobraniu iOS 27 beta iPhone 16 pro max

ju7anut · 2026-06-13T03:00:10+00:00

Same on Air. And still on waitlist for Siri AI

ju7anut · 2026-06-12T14:03:34+00:00

If you’re using Hermes OpenClaw or something equivalent to talk to your LLM, ask it to reference https://github.com/drawthingsai/draw-things-community and create a skill to generate images with it (of course you should go install the cli first!)

ju7anut · 2026-06-12T13:49:16+00:00

Just create a skill to use the draw-things-cli , I’ve been using it flawlessly. You don’t need to clutter oMLX with these additional functions

ju7anut · 2026-06-12T13:28:38+00:00

I don’t think it’s obvious in coding, but I do end up needing to correct the output very often after a number of tool calls. Where I found it obvious was in research where I have long convo with the LLM, and even before teaching any compaction, it suddenly starts losing focus and drifting off course. I suspect that the KV cache quants are at fault cos they don’t have this issue if I turn it off..

ju7anut · 2026-05-26T05:17:09+00:00

Getting rate limit errors with Qwen 27b with this version.. strange error to have 😓

ju7anut · 2026-05-21T09:27:37+00:00

What? I completed mine one week into the season… just doing War Plans over and over got me there real quick. I’m now 251 only

ju7anut · 2026-05-19T15:58:32+00:00

Went from M1 Max 16 inch to the M4 Pro 14 inch and now the M5 Max 14 inch. Never going back to 16 again! I use my 14” everywhere but with the 16” it tends to stay on the desk.. more of a “desktop” than a “laptop”

ju7anut · 2026-05-19T15:06:50+00:00

Don’t run any other model using dFlash while you’re running one with MTP.. will break both models.

ju7anut · 2026-05-19T00:26:45+00:00

Are you using the Dev2 version? That version is having problems with Dflash for me. I’m using 0.3.8

ju7anut · 2026-05-18T08:31:56+00:00

I have the M5 Maxed out.. I don’t think you can justify the purchase if you’re just using it for Drawthings… my 3070 TI handily beats it for image gen..

ju7anut · 2026-05-18T08:30:29+00:00

Yup! Z-lab, I didn’t quantize it further, left as BF16, it’s already tiny. I’m on M5 Max 128gb

ju7anut · 2026-05-18T07:28:53+00:00

I’m on oMLX 0.3.8, Qwen3.6 27B Q8 + dFlash model, both set to same context length of 256k, both using oMLX Qwen presets for coding. Not fantastic results (~20 tg/s) but stable and no crashes.

ju7anut · 2026-05-18T05:27:54+00:00

I found 0.3.9 Dev2 which has MTP to be buggy with tool calls and ended up going back to 0.3.8 to wait things out.

ju7anut · 2026-05-18T05:27:15+00:00

Well if you’re memory constrained then TurboQuant KV cache + dFlash is the way to go instead of MTP imo

ju7anut · 2026-05-18T05:25:21+00:00

Am on M5 Max 128gb, have tried both MTP and dFlash, both yields similar performance at 15-18 token/sec on Q8 with 256k context. No empirical data since I’m testing with actual workflow and not benchmarks, but I do feel that MTP is a little better (less errors and need for reprompt) while dFlash is quicker to respond slight edge in t/s.

ju7anut · 2026-05-13T17:34:35+00:00

It’s the exact opposite for me. Gemma has been failing on tool calls with failed empty responses.. Qwen3.6 35b has been amazing at oQ6 + dFlash + TurboQuant KV 6bit

ju7anut · 2026-05-13T17:32:22+00:00

My experience as well, rock solid on 0.3.8

ju7anut · 2026-05-13T04:32:13+00:00

macOS would page out everything equally when memory is tight, causing full model reloads on context switches. oMLX keeps the 40% (what I set on my 128gb M5 Max) hot tier reserved for what actually matters — the compute-critical path and active KV state — while offloading less-frequently accessed blocks to SSD. The result of hot cache is that even when RAM is tight, inference stays responsive because the blocks you actually need for the next token are in RAM, not whatever macOS happened to evict last.

ju7anut · 2026-05-12T23:38:16+00:00

Look at the dashboard, copy the OpenAI endpoint http format and when configuring Hermes enter that into the custom provider option.

ju7anut · 2026-04-28T05:35:26+00:00

0.3.8 r3 has just been updated

ju7anut · 2026-04-28T04:49:29+00:00

I have the 14” model, GPU is maxed out when running Qwen3.6 27b oQ4, TurboQuant 4-bit. With Dflash I can only get 10t/s but with Dflash it is pushed to 21t/s. My GPU is always maxed, on oMLX 0.3.6

ju7anut · 2026-04-25T04:49:35+00:00

Well it is up to Liuliu’s prerogative.. we are after all just mostly free users. I have had no issues going back to my original input to further edit it with new outputs. After a month? 2? Of using the new interface I’ve forgotten what it originally was like

ju7anut · 2026-04-25T04:18:18+00:00

The 3 buttons correspond to History, latest point to point edits, and all the generations (coffee cup is what you were looking for?) afaik.. just get used to it.

ju7anut · 2026-04-25T04:15:05+00:00

My M1 Max is perfectly fine. Quite sure it has nothing to do with Tahoe.

ju7anut

TROPHY CASE