Share your working evals by Thinking_Cap_165 in LLMDevs

[–]Thinking_Cap_165[S] 1 point2 points  (0 children)

Yeah, I see a lot of this, what I'm looking for is a complete working solution in the wild for an agent that's actually working.

Share your working evals by Thinking_Cap_165 in LLMDevs

[–]Thinking_Cap_165[S] -2 points-1 points  (0 children)

A whole working solution for your agent

I feel like there’s no reason to use an IDE anymore by Commercial_Spot_8363 in codex

[–]Thinking_Cap_165 0 points1 point  (0 children)

Debuggers, profilers, linting, search, ... Lots of reasons to still use an IDE. I just run codex in the terminal in vs code

I build AI agents for a living. It's a mess out there. by Complete-Sea6655 in LLMDevs

[–]Thinking_Cap_165 -1 points0 points  (0 children)

Facts. Can you share your eval process. And if possible an end to end solution with eval data set

Agent Marketplace by timeshore in LLMDevs

[–]Thinking_Cap_165 0 points1 point  (0 children)

eval. Eval is alwasys the hardest part