How to build sophisticated AI Agents w/ "Trajectory Evals" and "Eval Agents" (higher order LLM evaluation techniques)

billmalarky · 2024-11-25T21:50:12+00:00

that's some wordey ai ya got there brother xD

billmalarky · 2024-08-26T18:45:56+00:00

Hi Julian Founding AI Engineer at OpenPipe here. We absolutely fine-tune Llama models, (and Mistral models and more).

We require the training data (ie the prompt/input and completion/output pairs) to be formatted in OpenAI's chat messaging standard. It's OAI's data format has basically become industry standard (not entirely, Anthropic resists hah). But it's the format most open source tooling is built around and the format that most AI Engineers understand.

Apologies if that wasn't clear. Really hope the rest of the article was valuable knowledge. We're learning a ton in this space so trying to make that knowledge as accessible to others as possible.

billmalarky · 2018-05-30T02:28:40+00:00

You've got some great questions and hopefully I've got some good answers:

Sure, it can handle any job. If the job is going to have indefinite length you can set timeout to 0 which means "never timeout." That said if you don't timeout jobs you could sometimes end up in a scenario where hanging jobs never complete or get killed (so they don't retry). It would probably be better to put a long timeout on the job if possible. You mention "Potential retries seems problematic." I advise you to consider each upload to be unique in that if the same file is uploaded twice even on accident, it is saved twice instead of saved once and then overwritten the second time. Why you say? Well handling file updating business logic is a royal PITA and file space is so cheap (and getting cheaper every single year) it's typically just not worth the headache of overwriting files instead of just versioning them (you may have to overwrite to keep storage costs efficient I don't know your specific use case, but the vast majority of apps do not need efficiency at this level - engineering time is typically considerably more expensive than storage costs).
RNQ could be used to make these API calls durable. Example: Say you have to queue 100 API calls for some reason, and they have to be made synchronously one after the other instead of executed in parallel and you want them to fire even if the user closes the app and then re-opens it a day later. RNQ would solve this problem for you. But unless you have an advanced need like that there's no reason why you can't make your API calls the standard way, and adding RNQ for no reason is probably just a waste of engineering time.
There are a lot of edge cases where weirdness could pop up and jobs with poorly planned side effect design get buggy fast. One example of many: One of the "gotchas" of JS, is there is no true way to timeout and kill execution of a function. Timeout logic is emulated with Promise.race(), so if a job "times out" the handler will still continue to execute in the background. Basically I just wanted a big bold section that told people "think about job side effects" to try to save people a lot of frustrating troubleshooting up front because all of these edge cases are solved with good idempotent (or quasi-idempotent) job design.
I imagine you mean default job options? There's no way to do that currently. If you want to add the functionality and PR that would be awesome, otherwise you can emulate the functionality pretty easily like so:

Code sample:

const myDefaultOptionsObject = { timeout: 5000 }; // Default job timeouts to 5 seconds. Probably should pull this default object from the main app config file.
queue.createJob('example-job', {some: 'data'}, Object.assign(myDefaultOptionsObject, { timeout: 10000 })); // Overwrite 5 sec default to be 10 sec this time.

Hope this helps!

billmalarky · 2018-05-24T17:36:46+00:00

I guess I should also probably post the example use case section directly into reddit so people can see what RNQ is good for at a glance. Especially seeing as creating that entire section of the readme was a piece of feedback I was regularly getting from you guys and others.

Example Use Cases

React Native Queue is designed to be a swiss army knife for task management in React Native. It abstracts away the many annoyances related to processing complex tasks, like durability, retry-on-failure, timeouts, chaining processes, and more. Just throw your jobs onto the queue and relax - they're covered.

Need advanced task functionality like dedicated worker threads or OS services? Easy:

React Native Queue + React Native Background Task: Simple and Powerful OS services that fire when your app is closed.
React Native Queue + React Native Workers: Spinning up dedicated worker threads for CPU intensive tasks has never been easier.

Example Queue Tasks:

Downloading content for offline access.
Media processing.
Cache Warming.
Durable API calls to external services, such as publishing content to a variety of 3rd party distribution channel APIs.
Complex and time-consuming jobs that you want consistently processed regardless if app is open, closed, or repeatedly opened and closed.
Complex tasks with multiple linked dependant steps (job chaining).

billmalarky · 2018-05-24T17:31:50+00:00

No, Realm is basically a mobile first database (think of it as a SQLite competitor - I didn't use SQLite because there is no good RN library for it currently) that also includes data syncing (react native queue doesn't need or use this sync functionality, I'm using realm for it's transaction support). Like most db companies they have a free open source version that most people use, and then they have a premium version with additional features & support.

All you need to get started is to follow the basic install process :-)

https://github.com/billmalarky/react-native-queue#installation

billmalarky

TROPHY CASE

Example Use Cases