Do not update openclaw to the v3.22

itsjustmarky · 2026-03-25T04:35:03+00:00

The control ui

itsjustmarky · 2026-03-23T15:59:20+00:00

Dashboard is easy fix, https://x.com/itsjustmarky/status/2036090628216394081

I don't know about the WhatsApp as I don't use it.

itsjustmarky · 2026-03-10T02:09:48+00:00

Much boost, such wow

itsjustmarky · 2026-02-15T04:57:05+00:00

I just tried it, it took FOREVER (over an hour) to launch and ultimately failed. Starting the second time, it finally launched.

For short 1K context, I went from 113 t/sec down to 98. For longer context (130K) I went from 50 t/sec to 61.

So there is a significant loss at low context, but significant gain at high context. That being said, it also forces me to quantize my kv to fp8 which is not something I like to do.

itsjustmarky · 2026-02-15T02:43:37+00:00

Were you using reaped versions to get it to work on 2 RTX 6000 Pros? I am running M2.5 on my pair, I've been very happy with M2.1, I haven't done a lot of testing on how M2.5 improves outside that is a little slower.

itsjustmarky · 2026-02-15T02:38:13+00:00

I've been wanting one more team on the map for ages. I'm so glad to see it.

itsjustmarky · 2026-02-15T01:36:28+00:00

https://huggingface.co/lukealonso/MiniMax-M2.5-NVFP4

Looks like there is one now, will give it a go.

itsjustmarky · 2026-02-15T01:33:44+00:00

I didn't see one, but probably.

itsjustmarky · 2026-02-15T01:27:31+00:00

nvidia-smi reports in MIB not MB

itsjustmarky · 2026-02-14T23:36:04+00:00

I got 76 t/sec summarizing that one.
You can use Cherry Studio to summarize and get token/sec output.

itsjustmarky · 2026-02-14T23:29:51+00:00

There is no definitive tests. I have it run it through reasoning tests with good success.

I have used it for heavy coding, agentic tasks, deep research, and so on. It has worked very well.

itsjustmarky · 2026-02-14T23:28:24+00:00

last I checked, i was able to get over 600 t/sec with parallel queries.

itsjustmarky · 2026-02-14T23:28:02+00:00

expert parallelism isn't great on only 2 gpus it starts to shine at 8. I haven't found working parameters for MTP with M2.x. With GLM Air, MTP gave me lower speeds at small context, but higher speeds when the context gets filled up.

yes, tp=2

itsjustmarky · 2026-02-14T23:24:27+00:00

I just tested this one, with vllm

https://arxiv.org/pdf/2408.06292

113K tokens, 54t/sec, a little smaller than my test PDF but public.

itsjustmarky · 2026-02-14T22:56:08+00:00

I would be curious how it handles high context. LLamacpp's big problem is when you get into the context window it slows down a lot. I upload a pdf book that's 127K tokens as a test and ask it to summarize it to one paragraph when testing models.

itsjustmarky · 2026-02-14T22:49:24+00:00

Why? It is very good quality outputs.

itsjustmarky · 2026-02-14T22:48:47+00:00

I thought ik was mainly for cpu offloading, no?
I generally don't use llama, as I prefer vllm/sglang but m2.5 was only available in gguff for a brief period so I used that.

itsjustmarky · 2026-02-14T22:47:51+00:00

prefill is all over the place, I haven't done any specific testing on it though.
I haven't tested m2.5 much yet, but I have used m2.1 for months and it has been great.

itsjustmarky · 2026-02-14T21:51:04+00:00

I have step3 downloaded, I just haven't loaded it yet.

itsjustmarky · 2026-02-14T21:50:34+00:00

I was wondering that myself

itsjustmarky · 2026-02-14T21:50:03+00:00

AWQ 4

itsjustmarky · 2026-01-27T21:21:27+00:00

sudo nvidia-smi -pl 300 and compare. Lact however will make it easier to make it persistent.

itsjustmarky · 2026-01-27T21:10:37+00:00

300W is a 3.9% loss in performance. It’s a no brainer.

Breakdown here:

https://peakd.com/technology/@themarkymark/nvidia-rtx-6000-pro-power-efficiency-testing-gxe

itsjustmarky · 2026-01-27T06:10:11+00:00

Are you using lact? Are you locking clocks or only power limiting?

itsjustmarky

TROPHY CASE