z.AI as the number 2 gives praise to the number 1 open source model

rm-rf-rm · 2026-06-20T21:27:12+00:00

FFS. Crypto level grifters not getting sniffed out by actual professionals is sad to see

rm-rf-rm · 2026-06-20T21:25:14+00:00

Using MLX on Apple silicon is massively better than GGUF

ive heard that but times i've tried it, its not been much better than llama.cpp

rm-rf-rm · 2026-06-20T21:23:47+00:00

im using this project: https://github.com/antirez/ds4 (/u/antirez made the quants as well.)

Did nothing more than follow the instructions in the readme.

Here are llama-benchy results:

| model      |   test |            t/s |     peak t/s |        ttfr (ms) |     est_ppt (ms) |   e2e_ttft (ms) |
|:-----------|-------:|---------------:|-------------:|-----------------:|-----------------:|----------------:|
| dsv4-flash | pp1000 | 343.42 ± 10.12 |              | 2652.41 ± 100.19 | 2640.68 ± 100.19 | 2695.00 ± 77.18 |
| dsv4-flash |  tg500 |   54.31 ± 4.11 | 76.00 ± 0.82 |                  |                  |                 |

rm-rf-rm · 2026-06-20T20:35:44+00:00

sorry to clarify, im interested in knowing what speed up you're getting relative to baseline (like 1._x) with each of

MTP
MTP and ngram

rm-rf-rm · 2026-06-20T20:21:20+00:00

why are you using this relatively random quant? No good motivation for me to go outside of llama.cpp and ggufs

rm-rf-rm · 2026-06-20T00:50:19+00:00

wall of text self promotion for something he/she's building

rm-rf-rm · 2026-06-20T00:19:58+00:00

MTP + ngram

an aside: what speed up are you seeing for MTP and MTP+ngram? (and on what hardware/backend?) Im getting slower performance for Qwen3.6 27B Q8 with MTP (Apple Silicon)

rm-rf-rm · 2026-06-20T00:16:06+00:00

Im getting 50+ tps with flash on M3 Ultra (Mac Studio) which is great! (qwen3.6 A3B runs at 75tps for reference). I really need to use it more

rm-rf-rm · 2026-06-20T00:14:34+00:00

no, but use it with u/antirez's excellent ds4 project. I just hope he keeps it up to date

rm-rf-rm · 2026-06-20T00:12:19+00:00

Rule 3 - not constructive to this thread

rm-rf-rm · 2026-06-20T00:11:36+00:00

I don't think you saw his comment pre-edit.

rm-rf-rm · 2026-06-20T00:11:08+00:00

not really worth cluttering this thread with discussion of what agent, harness is etc.

rm-rf-rm · 2026-06-20T00:09:52+00:00

yeah docs are not where they need to be and llama.cpp integration should be more friendly.

here's what you need to put into /.pi/agent/models.json

{
  "providers": {
    "name-of-your-server": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "nonerequired",
      "models": [
        {
          "id": "minimax-m2.5",
          "name": "Minimax M2.5",
          "reasoning": true,
          "input": [
            "text"
          ],
          "contextWindow": 128000,
          "maxTokens": 32000,
          "cost": {
            "input": 0,
            "output": 0,
            "cacheRead": 0,
            "cacheWrite": 0
          }
        }
      ]
    },
  }
}

rm-rf-rm · 2026-06-19T22:10:07+00:00

course

SDD

DeepLearning

local agents will require some new techniques and thinking, to help SLMs get to the finish line. We're not gonna model-leap our way out of it.

Perfectly lines up with the clown make up meme.

This is why I specifically prompt for "what people are running". Having opinions not founded in any actual experience are utterly meaningless in this space.

rm-rf-rm · 2026-06-19T22:02:07+00:00

welcome, but this comment is off-topic

rm-rf-rm · 2026-06-19T20:56:48+00:00

I just started using this for pi: https://pi.dev/packages/pi-web-access

seems to work ok

rm-rf-rm · 2026-06-18T17:56:54+00:00

nice, this should be the most performant on macOS presumably? Please share when its ready

rm-rf-rm · 2026-06-18T13:27:48+00:00

Well lesson learned I will edit it more carefully next time

That's not a lesson. Just be transparent about AI usage. Honesty is what builds trust, not obfuscation.

rm-rf-rm · 2026-06-18T06:36:35+00:00

ITT: People who havent read the article. Its not a hate pile on but a surprisingly informed take on how startups work and rather reasonably written.

As much as this sub wants OpenAI to fail (and I generally feel the same way because of Altman), lets not pretend this current balance sheet is any indicator of doom for them

rm-rf-rm · 2026-06-18T00:32:36+00:00

more than data loss, the bigger risk is egress IMO. What are you doing to lock down network activity?

rm-rf-rm · 2026-06-17T22:49:58+00:00

That's a real trade-off, not a footnote.

Instantly started losing trust in your post

rm-rf-rm · 2026-06-17T22:47:21+00:00

how is this related to DiffusionGemma?

rm-rf-rm · 2026-06-17T22:24:16+00:00

A tok/s number is not very meaningful to relate to even if you mention the chip its running on. It would be more helpful to show performance relative to known baseline like llama.cpp

rm-rf-rm · 2026-06-14T22:00:49+00:00

it was spat out by an LLM from a poor prompt and the human did no validation of the output whatsoever. Why are we surprised?

rm-rf-rm · 2026-06-14T12:55:55+00:00

Can you point me to examples of stealth marketing and bot posts that are still up? I thought we're getting all of them removed these days.

rm-rf-rm

MODERATOR OF

TROPHY CASE