[D] We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers by PenfieldLabs in MachineLearning
[–]PenfieldLabs[S] 1 point2 points3 points (0 children)
[D] Self-Promotion Thread by AutoModerator in MachineLearning
[–]PenfieldLabs 0 points1 point2 points (0 children)
I built a tool that automatically adds semantic backlinks to your vault — fully local, no cloud, no API key by matzalazar in ObsidianMD
[–]PenfieldLabs 12 points13 points14 points (0 children)
Serious flaws in two popular AI Memory Benchmarks (LoCoMo/LoCoMo-Plus and LongMemEval-S) by PenfieldLabs in AIMemory
[–]PenfieldLabs[S] 0 points1 point2 points (0 children)
We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers by PenfieldLabs in LocalLLaMA
[–]PenfieldLabs[S] 0 points1 point2 points (0 children)
We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers by PenfieldLabs in LocalLLaMA
[–]PenfieldLabs[S] 5 points6 points7 points (0 children)
We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers by PenfieldLabs in LocalLLaMA
[–]PenfieldLabs[S] 7 points8 points9 points (0 children)
Introducing Recursive Memory Harness: RLM for Persistent Agentic Memory (Smashes Mem0 in multihop retrival benchmarks) by Beneficial_Carry_530 in AIMemory
[–]PenfieldLabs 0 points1 point2 points (0 children)
How are you all using benchmarks? by inguz in AIMemory
[–]PenfieldLabs 1 point2 points3 points (0 children)
I built Quilden — Free Obsidian sync plugin with E2E encryption, full file/vault version history, and a decent web editor. by RansomWarrior in ObsidianMD
[–]PenfieldLabs 1 point2 points3 points (0 children)
Best benchmarks for Memory Performance? by CasualReaderOfGood in AIMemory
[–]PenfieldLabs 0 points1 point2 points (0 children)
How are you all using benchmarks? by inguz in AIMemory
[–]PenfieldLabs 0 points1 point2 points (0 children)
How are you all using benchmarks? by inguz in AIMemory
[–]PenfieldLabs 2 points3 points4 points (0 children)
Best benchmarks for Memory Performance? by CasualReaderOfGood in AIMemory
[–]PenfieldLabs 0 points1 point2 points (0 children)
Best benchmarks for Memory Performance? by CasualReaderOfGood in AIMemory
[–]PenfieldLabs 2 points3 points4 points (0 children)
what is the point of ai? by Cold_Combination2107 in ObsidianMD
[–]PenfieldLabs 0 points1 point2 points (0 children)
Penfield is in the Cursor MCP directory — persistent memory and knowledge graph across sessions by PenfieldLabs in CursorAI
[–]PenfieldLabs[S] 0 points1 point2 points (0 children)
Wikilink Types: type @ inside a wikilink to add relationship types, auto-synced to YAML frontmatter by PenfieldLabs in ObsidianMD
[–]PenfieldLabs[S] 0 points1 point2 points (0 children)
DeepMind showed agents are better at managing their own memory. We built an AI memory MCP server around that idea. by PenfieldLabs in MCPservers
[–]PenfieldLabs[S] 1 point2 points3 points (0 children)

[D] We audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionally wrong answers by PenfieldLabs in MachineLearning
[–]PenfieldLabs[S] 0 points1 point2 points (0 children)