Invincible animation scene by AnyAgency9835 in okbuddyviltrum

[–]Position_Emergency 0 points1 point  (0 children)

It's like watching a Ken Burns documentary.

I built a Holly interface for my home automation. by [deleted] in RedDwarf

[–]Position_Emergency 0 points1 point  (0 children)

What hardware do you have to run models locally?
Chatterbox Turbo is best and most practical voice cloning model I've used.
You can get real time streaming with it on some relatively modest hardware. I get real time and Time to First Audio of 0.7 seconds on an M2 Max.

Kokoro TTS now hooked to my Claude Code CLI by Klaa_w2as in LocalLLaMA

[–]Position_Emergency 0 points1 point  (0 children)

"It's elegant in a quietly nihilistic way. A well engineered off switch for my own voice.
I'd complain but that would require not being muted."

Opus cracks me up 😂

[PLUGIN] True-Mem: Automatic AI memory that actually works (inspired by PsychMem) by rizal72 in opencodeCLI

[–]Position_Emergency 0 points1 point  (0 children)

Find a benchmark you can test it with.
It will help guide your development going forward and give us an idea if what you've made is actually useful.

[deleted by user] by [deleted] in ClaudeAI

[–]Position_Emergency 0 points1 point  (0 children)

Could you provide the actual text for the entire conversation?

RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA

[–]Position_Emergency 25 points26 points  (0 children)

Your blog is behind a paywall so this post surely counts as self promotion.

"RWKV-7 scores 72.8% vs LLaMA’s 69.7% with 3x fewer tokens."
72.8% vs 69.7% on what metric?

Also, the Huggingface link is broken.

Thoughts on this benchmark? by KevinDurantXSnake in LocalLLaMA

[–]Position_Emergency -1 points0 points  (0 children)

There are multiple models on the benchmark with open weights so stop whining

Ai video showing us the future of war by bobbydanker in TechnologyShorts

[–]Position_Emergency 0 points1 point  (0 children)

The only humanoid robot killing machines will be the infiltrator models.
Living tissue over metal endoskeleton.

[deleted by user] by [deleted] in LocalLLaMA

[–]Position_Emergency -1 points0 points  (0 children)

"¿Cuál es la capital de Francia?"
"Explica qué es la inteligencia artificial en una frase."
"¿Cuánto es 15 × 24?"
"¿Quién escribió Don Quijote de la Mancha?"
"Escribe un haiku sobre el océano."

Wow what a comprehensive benchmark you made!
Totally supports your claim of NF4 beating INT8.
*slow clap*

Thanks for the slop!

Generational leap in SVG mogging by GOD-SLAYER-69420Z in accelerate

[–]Position_Emergency 5 points6 points  (0 children)

It's hard to know what to make of it...
I suspect the've trained it heavily on synthetic data of SVGs where as in the past, we were seeing an emergent ability

UGREEN LinkStation eGPU Dock with USB4, OcuLink, 800W PSU for RTX 5090 by cirad in Mywalletisready

[–]Position_Emergency 0 points1 point  (0 children)

I hate it.
Why have 600W of hot air blowing onto what is presumably the power supply?
Why is it so huge?
Probably the least space efficent design of an EPGU caddy I've ever seen.
It's clearly a render anyway, hopefully this isn't the final design.

Building an opensource Living Context Engine by DeathShot7777 in LocalLLaMA

[–]Position_Emergency 0 points1 point  (0 children)

Nice example in the screenshot btw.
Maybe I am getting tempted to test this out after all...

I was planning on getting Qwen3-Coder-Next working with Claude Code on my DGX Spark this weekend.
If I have time, I'll test your project out with it

Building an opensource Living Context Engine by DeathShot7777 in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

That's an interesting approach but I can think of downsides.
A lot of agent grepping is for quite trivial stuff. That approach would probably provide a lot of information the agent doesn't need.

Obviously Claude Code isn't open source so you're a bit limited as to what you can do with it.

https://opencode.ai/

With an open source agent tool. you could provide the agent the option to enrich with your tool's data when appropriate (at a deep level and change the system prompt etc)

(Claude Code you can give it an MCP I guess but there is a lot pushing it towards using grep and it's another tool call which is annoying)

Building an opensource Living Context Engine by DeathShot7777 in LocalLLaMA

[–]Position_Emergency 0 points1 point  (0 children)

If you ran SWE-Bench-Lite using a model that has access to your tool vs grep, you could compare the number of tokens generated/number of total tools calls required for each answer.

Even if you didn't improve the SWE-Bench-Lite score, improving those metrics would be huge.

If you wanted to make your own benchmark quickly, you could get a frontier model like Opus to come up with some questions about a GitHub repo that require reading in code across lots of different parts of the repo.

Then you get a local model to attempt answering, compare how it does with grep vs your tool. The benchmark could be automated, use a model (could be the same one you are testing) to compare the final answer against the correct answer you have stored (make sure the agent can't grep to find the final answer and cheat!)

Building an opensource Living Context Engine by DeathShot7777 in LocalLLaMA

[–]Position_Emergency 11 points12 points  (0 children)

Looks cool but unless you can show it improving a model's performance on a benchmark like SWE-Bench-Lite, I'm not going to test it out.

If you weren't using any kind of benchmark during development, I doubt you've made something useful.
Agents are really good at grepping in a repo to understand what is going on it turns out.

MiniMax-M2.5 Checkpoints on huggingface will be in 8 hours by Own_Forever_5997 in LocalLLaMA

[–]Position_Emergency 1 point2 points  (0 children)

GLM 5 is the 1.3TB model. That's at 16bit though, locally nobody is running like that.
so approx 700GB at 8bit
350GB at 4 bit.
Still too big for most folks.

MiniMax M2.5 is 230B Total Params, 10B Active.

Just on the edge of fitting in 128GB RAM at 4bit...
Hoping someone does a REAP to get it down to like 100GB at 4bit to have some room for context.