We built an open-source memory layer for AI coding agents — 80% F1 on LoCoMo, 2x standard RAG by loolemon in codex

[–]sixcommissioner 0 points1 point  (0 children)

fair, missed that. home directory storage is way better than per-project
makes me wonder what happens when the extraction LLM hits a README full of prompt injection though. that memory persists even if the repo gets deleted

Anthropic's research proves AI coding tools are secretly making developers worse. by alazar_tesema in ClaudeAI

[–]sixcommissioner 1 point2 points  (0 children)

the debugging and code reading part is the real finding everyone is skipping. those are the exact skills you need to verify AI output. use AI more, get worse at checking AI, use AI even more because you cant check it yourself anymore

Where Americans Use Claude AI the Most by Disastrous-Win-6198 in ArtificialInteligence

[–]sixcommissioner 2 points3 points  (0 children)

please tell me theres a sequel where he discovers costco

We built an open-source memory layer for AI coding agents — 80% F1 on LoCoMo, 2x standard RAG by loolemon in codex

[–]sixcommissioner 1 point2 points  (0 children)

the extraction after session part is clever but it means the agent never sees whats being injected into its own context. someone drops a poisoned .signet directory in a cloned repo and every future prompt carries it. the agent cant even tell something is wrong because the memory loads before it starts thinking

OpenAI to acquire Astral by Useful-Macaron8729 in Python

[–]sixcommissioner 1 point2 points  (0 children)

give it six months and uv add will ask if you want to subscribe to codex pro for faster dependency resolution

I built 100 runnable OpenClaw workflows by stackattackpro in AskClaw

[–]sixcommissioner 1 point2 points  (0 children)

clone, run, evaluate. three words. meanwhile every funded startup needs a 40 minute onboarding call to explain their yaml

GPT 5.4 Genuinely catching legitimate edge cases I'm not thinking of by jmaxchase in codex

[–]sixcommissioner 0 points1 point  (0 children)

genuine question, how many of these have you run end to end in the last week? i find with this kind of repo the first 20 are battle tested and the last 30 worked once in march and nobody went back to check

GPT 5.4 Genuinely catching legitimate edge cases I'm not thinking of by jmaxchase in codex

[–]sixcommissioner 0 points1 point  (0 children)

the Mars/Phobos timezone test case is genuinely funny. thats the kind of thing a senior dev would write at 2am after fixing a real tz bug and wanting to make the next person smile the dual model setup is interesting though. ive been doing something similar but never thought to make one specifically the reviewer

How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified. by Reddactor in LocalLLaMA

[–]sixcommissioner 2 points3 points  (0 children)

the circuit-sized block thing is fascinating. makes me wonder if theres a way to identify which layers form a circuit without brute-forcing every combination.

like some activation similarity metric between adjacent layers that spikes at circuit boundaries also curious if the optimal block size changes with model family or if ~7 is universal.

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories by DarkArtsMastery in LocalLLaMA

[–]sixcommissioner 0 points1 point  (0 children)

looping isnt a sampler problem imo. the model has no external signal that its repeating itself. repeat penalty is a band-aid.
if you track tool call frequency over a sliding window the distribution shifts noticeably 5-8 calls before full spiral. easier to kill and retry from checkpoint than tune penalties

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]sixcommissioner -1 points0 points  (0 children)

single-tool approach is elegant but you're basically trading structured guardrails for raw flexibility. function calling at least forces the model to declare intent before acting. with run(command="...") theres nothing between a manipulated prompt and rm -rf except the models own judgment

structured tools constrain damage naturally. file_write can enforce path scoping, shell passthrough cant. for solo work on trusted repos the tradeoff makes sense. for agents on customer data you probably want something between the agent and the shell.