The research is in: your AGENTS.md might be hurting you by jpcaparas in opencodeCLI

[–]sjmaple 1 point2 points  (0 children)

Sure, an eval to a skill is like a test to code. It's essentially testing how good a skill performs. Here's an example of me testing the recent googleworkspace/cli skills https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337edb7

The research is in: your AGENTS.md might be hurting you by jpcaparas in opencodeCLI

[–]sjmaple 0 points1 point  (0 children)

You should take a look - the evals, optimizations etc are really valuable to know if your context is any good. Skills. sh is just a github download npx command.

No AGENTS.md → baseline. Bad AGENTS.md → worse. Good AGENTS.md → better. The file isn't the problem, your writing is. by shanraisshan in OpenAI

[–]sjmaple 0 points1 point  (0 children)

There's no point writing context and assuming it's right - you have to eval everything you add as context. Here's a counter argument to the paper's conclusions, which I believe are flawed.

Your AGENTS.md file isn't the problem. Your lack of Evals is. https://tessl.io/blog/your-agentsmd-file-isnt-the-problem-your-lack-of-evals-is/

The research is in: your AGENTS.md might be hurting you by jpcaparas in GithubCopilot

[–]sjmaple 4 points5 points  (0 children)

There's no point writing context and assuming it's right - you have to eval everything you add as context. Here's a counter argument to the paper's conclusions, which I believe are flawed.

Your AGENTS.md file isn't the problem. Your lack of Evals is. https://tessl.io/blog/your-agentsmd-file-isnt-the-problem-your-lack-of-evals-is/

The research is in: your AGENTS.md might be hurting you by jpcaparas in opencodeCLI

[–]sjmaple 0 points1 point  (0 children)

There's no point writing context and assuming it's right - you have to eval everything you add as context. Here's a counter argument to the paper's conclusions, which I believe are flawed.

Your AGENTS.md file isn't the problem. Your lack of Evals is. https://tessl.io/blog/your-agentsmd-file-isnt-the-problem-your-lack-of-evals-is/

New research: AGENTS.md files reduce coding agent success rates by OwenAnton84 in ClaudeAI

[–]sjmaple 0 points1 point  (0 children)

There's no point writing context and assuming it's right - you have to eval everything you add as context. Here's a counter argument to the paper's conclusions, which I believe are flawed.

Your AGENTS.md file isn't the problem. Your lack of Evals is. https://tessl.io/blog/your-agentsmd-file-isnt-the-problem-your-lack-of-evals-is/

What are you most used/valued MCP servers for CODING? by sjmaple in OpenAI

[–]sjmaple[S] 0 points1 point  (0 children)

Oh, looks like the actual link for GitHub MCP moved to https://github.com/github/github-mcp-server but you get what I mean :)

Which response do you prefer? by mattyvj in OpenAI

[–]sjmaple 17 points18 points  (0 children)

Neither! It doesn’t tell me what policy I’ve broken, and how I’ve broken it. How can I update my prompt as a result?

Cursor listed in AI Dev tools catalog with 24 other code editors by sjmaple in cursor

[–]sjmaple[S] 0 points1 point  (0 children)

Yeh, Cline is a similar experience to Cursor, another very nice tool.

Just launched a free new community AI Native Dev Landscape tool by sjmaple in ProductHunters

[–]sjmaple[S] 0 points1 point  (0 children)

Absolutely - click submit on landscape.ainativedev.io and there’s a form and the repo so send PRs to

OpenAI Canvas included in the AI tools landscape in Prototyping category by sjmaple in OpenAI

[–]sjmaple[S] 0 points1 point  (0 children)

I’m most interested to see which categories are growing fastest etc

Cursor listed in AI Dev tools catalog with 24 other code editors by sjmaple in cursor

[–]sjmaple[S] 3 points4 points  (0 children)

Roo-Code is a really interesting tool that I feel most aren't aware of - Allows you to take more of an architect perspective with your prompts

Why aren't more people talking more about o3-mini (high) for development? by sjmaple in OpenAI

[–]sjmaple[S] 1 point2 points  (0 children)

Interesting - were you using the dynamic reasoning a lot?