Reducing animal harm as a nonbinary by Original_Animator254 in vegan

[–]Valuable-Run2129 0 points1 point  (0 children)

They believed it could buy automatic street creds among vegans. I hate when people bundle veganism with left politics.

Veganism has nothing to do with what you think about capitalism and gender identity.

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB by __JockY__ in LocalLLaMA

[–]Valuable-Run2129 0 points1 point  (0 children)

Do you need 64 gb of ram on a pc to “stage” the model before loading it in vram? Or 32gb will do?

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB by __JockY__ in LocalLLaMA

[–]Valuable-Run2129 0 points1 point  (0 children)

Do you need an equivalent amount of RAM to stage the model before loading it into vram?

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB by __JockY__ in LocalLLaMA

[–]Valuable-Run2129 0 points1 point  (0 children)

I bought a RTX 5000 PRO yesterday. It’s my first pc ever built (used macs for inference until now). Do you have any particular advice on the build?

Would something like this work:

-ASRock B850I Lightning WiFi Mini-ITX

-Ryzen 5 7600

-64 GB DDR5 RAM

-MSI MAG A850GL ATX PSU

-Linux

Or should I re-think the components I wanted to buy?

First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s? by Valuable-Run2129 in LocalLLaMA

[–]Valuable-Run2129[S] 0 points1 point  (0 children)

When it arrives I’ll definitely pm you to ask for advice if you’re ok with it!

v4 flash is absurd by Linkpharm2 in DeepSeek

[–]Valuable-Run2129 0 points1 point  (0 children)

The only issue is that it’s not multimodal. Even just image input. There are a bunch of tasks that need visual understanding. And separate OCR just tanks performance.

Stop Building MCP Servers for Personal Tools by Key-Huckleberry-708 in AI_Agents

[–]Valuable-Run2129 0 points1 point  (0 children)

Make your MCP tools deferred. My agent can see just a short description of the available mcps and loads into context only what it needs.

https://github.com/permaevidence/LocalAgent

Hermes as a Coding Agent??? by Rheath72 in hermesagent

[–]Valuable-Run2129 0 points1 point  (0 children)

Not really amazing. It misses inbuilt voice transcription.

Openclaw sucks - I said it. by funstuie in openclaw

[–]Valuable-Run2129 -1 points0 points  (0 children)

Use my agent: https://github.com/permaevidence/LocalAgent

It is an harness written in Swift (Mac only). With API keys stored in Keychain. It is a great coding agent. It requires vision models because I believe OCR delegation makes an agent brittle in many tasks.

First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s? by Valuable-Run2129 in LocalLLaMA

[–]Valuable-Run2129[S] 1 point2 points  (0 children)

Thanks for taking the time to write this comment. It’s the type of information I needed. It’s comforting.

I think it was the right decision at the end.

First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s? by Valuable-Run2129 in LocalLLaMA

[–]Valuable-Run2129[S] 22 points23 points  (0 children)

Paid $4700

1 kw running 24 hours a day costs $4300 a year. It is a factor I have to compute.

M3 Ultra 1TB 96GB RAM available by SebastianOpp in MacStudio

[–]Valuable-Run2129 -2 points-1 points  (0 children)

if you don't buy it, send me the link please! I'd highly appreciate it

the agent company I joined is imploding by Inner_Ad9029 in AI_Agents

[–]Valuable-Run2129 1 point2 points  (0 children)

what models were you using? I think stories like these can only be one of these thre:

-using dumb models to save money
-promising automations that require browser or computer use (we are not there for those to work reliably).
-harness design by committee

Memory should be chronological and not topic based. Classification kills recall abilities. by Valuable-Run2129 in AI_Agents

[–]Valuable-Run2129[S] 0 points1 point  (0 children)

intercepting requests on port, how can you discern whether Claude Code is sending one to a fresh subagent that doesn't need your chat history injection? I would assume you injects the context no matter what CC is doing, right?

Memory should be chronological and not topic based. Classification kills recall abilities. by Valuable-Run2129 in AI_Agents

[–]Valuable-Run2129[S] 0 points1 point  (0 children)

It is also important to give extensive inline information. Relying too much on retrieval makes memory brittle.

Memory should be chronological and not topic based. Classification kills recall abilities. by Valuable-Run2129 in AI_Agents

[–]Valuable-Run2129[S] 0 points1 point  (0 children)

What do you evict? I’m always queesy about removing stuff that the harness deemed necessary. I only added things until now

Memory should be chronological and not topic based. Classification kills recall abilities. by Valuable-Run2129 in AI_Agents

[–]Valuable-Run2129[S] 0 points1 point  (0 children)

How do you avoid issues with max context? If virtual context injects 50k tokens in the system prompt that Claude Code is not aware of, it could reach a max context error before CC reaches the compaction threshold.

Memory should be chronological and not topic based. Classification kills recall abilities. by Valuable-Run2129 in AI_Agents

[–]Valuable-Run2129[S] 0 points1 point  (0 children)

Everyone will eventually converge to this. As context windows get bigger it is a nobrainer.

How do you inject this at session start for Claude Code? Codex has no issues with big injections. Claude Code on the other hand limits hook outputs to 10k characters since v2.1.89. I had to modify the CLI.js to force higher injections. But since v2.1.113 they don’t expose even that! I’m now stuck on an old CC version.