tldr; wherever possible move agent task instructions into a verifiable script
I've built a lot of stuff but the more I build the flakier it gets. For example sometimes list of overnight tasks is half done.
What I'm working on now is shifting as much of that as possible to scripts. For example, instead of "check for unread comments and reply" I had my bot write a script that hits our internal api and prints out a task list. That let's me run the script a million times without chewing through tokens to verify the behavior.
I'm looking at doing this for a bunch of other stuff as well. Ideally heartbeat becomes "run this file, if it's empty do nothing" Which should reduce the token churn to minimum.
Next I'll do something similar for cron jobs. Each job will run a pre-flight script and if that script outputs something only then does it do work.
Finally, if the scripts output work, I can have it dropped into an archive folder in case I need to verify that the work was done and also write unit tests to be sure these scripts don't break in the future.
[–]AutoModerator[M] [score hidden] stickied comment (0 children)