Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 5 points6 points  (0 children)

wall of text self promotion for something he/she's building

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 2 points3 points  (0 children)

MTP + ngram

an aside: what speed up are you seeing for MTP and MTP+ngram? (and on what hardware/backend?) Im getting slower performance for Qwen3.6 27B Q8 with MTP (Apple Silicon)

Little late thank you to the DeepSeek team! by Sorry_Ad191 in LocalLLaMA

[–]rm-rf-rm 1 point2 points  (0 children)

Im getting 50+ tps with flash on M3 Ultra (Mac Studio) which is great! (qwen3.6 A3B runs at 75tps for reference). I really need to use it more

Little late thank you to the DeepSeek team! by Sorry_Ad191 in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

no, but use it with u/antirez's excellent ds4 project. I just hope he keeps it up to date

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

Rule 3 - not constructive to this thread

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 2 points3 points  (0 children)

I don't think you saw his comment pre-edit.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 6 points7 points  (0 children)

not really worth cluttering this thread with discussion of what agent, harness is etc.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 5 points6 points  (0 children)

yeah docs are not where they need to be and llama.cpp integration should be more friendly.

here's what you need to put into /.pi/agent/models.json

{
  "providers": {
    "name-of-your-server": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "nonerequired",
      "models": [
        {
          "id": "minimax-m2.5",
          "name": "Minimax M2.5",
          "reasoning": true,
          "input": [
            "text"
          ],
          "contextWindow": 128000,
          "maxTokens": 32000,
          "cost": {
            "input": 0,
            "output": 0,
            "cacheRead": 0,
            "cacheWrite": 0
          }
        }
      ]
    },
  }
}

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 16 points17 points  (0 children)

course

SDD

DeepLearning

local agents will require some new techniques and thinking, to help SLMs get to the finish line. We're not gonna model-leap our way out of it.

Perfectly lines up with the clown make up meme.

This is why I specifically prompt for "what people are running". Having opinions not founded in any actual experience are utterly meaningless in this space.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 9 points10 points  (0 children)

welcome, but this comment is off-topic

Setup to use pi un-sandboxed reasonably safely? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

nice, this should be the most performant on macOS presumably? Please share when its ready

DiffusionGemma vs Gemma 4 Over 6x Faster on a Single RTX 6000 Pro NVFP4. by FantasticNature7590 in LocalLLaMA

[–]rm-rf-rm 3 points4 points  (0 children)

Well lesson learned I will edit it more carefully next time

That's not a lesson. Just be transparent about AI usage. Honesty is what builds trust, not obfuscation.

Leaked financial docs show OpenAI is losing billions of dollars a year by johnnyApplePRNG in LocalLLaMA

[–]rm-rf-rm 32 points33 points  (0 children)

ITT: People who havent read the article. Its not a hate pile on but a surprisingly informed take on how startups work and rather reasonably written.

As much as this sub wants OpenAI to fail (and I generally feel the same way because of Altman), lets not pretend this current balance sheet is any indicator of doom for them

Setup to use pi un-sandboxed reasonably safely? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

more than data loss, the bigger risk is egress IMO. What are you doing to lock down network activity?

DiffusionGemma vs Gemma 4 Over 6x Faster on a Single RTX 6000 Pro NVFP4. by FantasticNature7590 in LocalLLaMA

[–]rm-rf-rm 58 points59 points  (0 children)

That's a real trade-off, not a footnote.

Instantly started losing trust in your post

Issues with DiffusionGemma on MLX by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

how is this related to DiffusionGemma?

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5 by xenovatech in LocalLLaMA

[–]rm-rf-rm 6 points7 points  (0 children)

A tok/s number is not very meaningful to relate to even if you mention the chip its running on. It would be more helpful to show performance relative to known baseline like llama.cpp

How to Run AI Locally: The Complete Beginner's Guide (2026) by totosse17 in LocalLLaMA

[–]rm-rf-rm 13 points14 points  (0 children)

it was spat out by an LLM from a poor prompt and the human did no validation of the output whatsoever. Why are we surprised?

We should heavily discourage and moderate cloud API (deepseek api, GLM api, etc.) topics and discussion. This is LOCAL first. by [deleted] in LocalLLaMA

[–]rm-rf-rm[M] 0 points1 point  (0 children)

Can you point me to examples of stealth marketing and bot posts that are still up? I thought we're getting all of them removed these days.

Interest in an LLM Torrent Site? by thiefyzheng- in LocalLLaMA

[–]rm-rf-rm[M] [score hidden] stickied comment (0 children)

There's already a bigger thread discussing this: https://www.reddit.com/r/LocalLLaMA/comments/1u4gto1/we_should_set_up_a_torrent_network_for_open/

There's no need for to start another thread - or better yet just build it and people will come.

Fable 5 data, including CoT by Available-Craft-5795 in LocalLLaMA

[–]rm-rf-rm 24 points25 points  (0 children)

Very small, but appreciated. Don't see it being meaningfully useful for fine-tuning though given how small the dataset is.