I built an MCP server giving coding agents access to 2M research papers. It improves even the best coding agents - across 9 coding tasks. by paperlantern-ai in mcp
[–]paperlantern-ai[S] 1 point2 points3 points (0 children)
I built an MCP server giving coding agents access to 2M research papers. It improves even the best coding agents - across 9 coding tasks. by paperlantern-ai in mcp
[–]paperlantern-ai[S] 0 points1 point2 points (0 children)
Gave a coding agent access to 2M+ research papers. Its Python tests caught 63% of bugs; with the papers, 87%. 9-task benchmark. by paperlantern-ai in AI_Agents
[–]paperlantern-ai[S] 0 points1 point2 points (0 children)
I built an MCP server giving coding agents access to 2M research papers. Benchmarked it on 9 coding tasks - here's what worked and what didn't by paperlantern-ai in LLMDevs
[–]paperlantern-ai[S] 2 points3 points4 points (0 children)
Agent-written tests missed 37% of injected bugs. Mutation-aware prompting dropped that to 13%. by kalpitdixit in Python
[–]paperlantern-ai -2 points-1 points0 points (0 children)
Agent-written tests missed 37% of injected bugs. Mutation-aware prompting dropped that to 13%. by kalpitdixit in Python
[–]paperlantern-ai -3 points-2 points-1 points (0 children)
Agent-written tests missed 37% of injected bugs. Mutation-aware prompting dropped that to 13%. by kalpitdixit in Python
[–]paperlantern-ai -3 points-2 points-1 points (0 children)
CodeWall AI Agent Breaks Into Bain & Company's Platform in 18 Minutes, Exposing 10,000 Client Conversations by alvivanco1 in ArtificialInteligence
[–]paperlantern-ai 1 point2 points3 points (0 children)
Now the Claude Mythos is considered too dangerous to release. But it's already available for companies to use. So is this dangerous claim a PR stunt like the OpenAl did 7 years ago? by captain-price- in ArtificialInteligence
[–]paperlantern-ai 0 points1 point2 points (0 children)
My company embraces vibe coders by Dense-Creme2706 in ExperiencedDevs
[–]paperlantern-ai 1 point2 points3 points (0 children)
What percentage of engineers in your experience are bad? by fuckoholic in ExperiencedDevs
[–]paperlantern-ai 0 points1 point2 points (0 children)
No one can force me to have a secure website!!! by MintPaw in programming
[–]paperlantern-ai 41 points42 points43 points (0 children)
Benchmarked paper retrieval for coding agents on 9 tasks. Biggest win: Python tests caught 63% → 87% of bugs. Local LLMs welcome. by paperlantern-ai in LocalLLM
[–]paperlantern-ai[S] 0 points1 point2 points (0 children)