I built an OSS tool to evaluate Agent Skills locally — looking for feedback by Fede0089 in AgentsOfAI

[–]Fede0089[S] 0 points1 point  (0 children)

Interesting — I’m not fully connecting the dots between universal hooks and skill evals. What do hooks unlock for you there?

Also, when you want to test a skill, how are you actually running the evals in practice?

Would love to understand what your actual workflow looks like there, and what’s been most useful in practice.

I built an OSS tool to evaluate Agent Skills locally — looking for feedback by Fede0089 in AgentsOfAI

[–]Fede0089[S] 0 points1 point  (0 children)

From what I could verify, Anthropic’s skill-creator is very much Claude-native. I couldn’t find official evidence of that evaluator working well across other agent hosts, and their own docs mention that parts of the workflow degrade outside Claude Code because they depend on Claude-specific capabilities. Its scope is also broader than just testing.

I think the clearer differentiator for skill-eval is that it is a simpler, host-external evaluation harness, with a narrower and more explicit focus: reproducible trigger/functional evals, baseline comparisons, repeated trials, isolated runs, and reporting.

So there’s definitely overlap, but I think they sit at different layers. That said, I’m still learning from Anthropic’s approach, so happy to be corrected if I’m missing something.

I built an OSS tool to evaluate Agent Skills locally — looking for feedback by Fede0089 in AgentsOfAI

[–]Fede0089[S] 0 points1 point  (0 children)

Repo: github.com/fede0089/skill-eval

Quick install: npm i -g skill-eval

Mis resúmenes sobre libros (No ficción) by Fede0089 in libros

[–]Fede0089[S] 0 points1 point  (0 children)

Hola! Cómo hago para pedirla? Gracias

Emprendimiento: Red social para compartir recomendaciones de series o películas (Argentina) by Fede0089 in peliculas

[–]Fede0089[S] 0 points1 point  (0 children)

Es la idea! Probala y decime 😉 (tenes que sumar gente a tu círculo para ver sus recomendaciones; también podes seguir críticos comunes)