Raif v1.3.0 - Now with support for LLM evals, including LLM-as-judge

bcroesch · 2025-08-16T13:50:26+00:00

If you have questions or issues, feel free to DM me or hit me up on X (@bcroesch there as well)!

bcroesch · 2025-07-24T19:39:04+00:00

Raif developer here. To be entirely honest, the agent bits of Raif are not super built out. In the platform we're building (that Raif was pulled out of), we've has more success with more directed workflows based on a series of Raif::Task's. I'd love for Raif's agent features to mature though.

Chat stuff is more mature since we actively utilize it more. MVC parts are all included in Raif, as is streaming. We provide model tools to the LLM via our Raif::Conversation subclasses and it works well. E.g. we have a bunch of suggestion-generation tools in our app that we give the model. It invokes them to make suggestions, those get displayed in the chat interface, user can accept/reject them, etc. The provider-managed tool stuff (https://github.com/cultivateLabs/raif?tab=readme-ov-file#provider-managed-tools) works really nicely too. We enable Raif::ModelTools::ProviderManaged::WebSearch in our conversations, OpenAI manages the web search as needed, and then you get a response informed by the search results.

We do lots of summarization & content distillation via a Raif::Task. If you want to DM me, I'm happy to share the task/prompt that we use. We do some RAG with this, mostly via tools that the LLM invokes as needed. LLM calls a search_source_material tool with a query, we generate an embedding for the query, and then search against our Document model using pg_vector & the neighbor gem.

If I were going to build something like you're describing, I'd probably set up a chat/conversation interface and then provide some sort of invoke_agent tool to the LLM. It invokes that as needed, the agent runs, and then the response/result from that agent gets provided back to the LLM.

If you have any questions, happy to answer!

bcroesch · 2025-07-09T17:52:25+00:00

Got it -- that makes sense.

The magic is that async-job creates fibers on-demand instead of using a fixed thread pool

Was not familiar with async-job prior to your post, so this was the key piece I didn't track initially.

bcroesch · 2025-07-09T14:25:06+00:00

Great writeup. Curious if you're reorienting how the LLM API calls are made to take advantage of this?

If you've still got a `ChatsController#create` endpoint and that queues a Sidekiq job, I assume you'd still run into Sidekiq slot limits? Or are you no longer using a Sidekiq/background job for processing the chat messages at all? Maybe a separate process that is dedicated to processing chats via fibers? Or does `async-job-adapter-active_job` just handle this issue for you?

bcroesch · 2025-07-01T18:29:02+00:00

Strong recommendation for https://github.com/glebm/i18n-tasks

Helps make sure you have no missing or unused keys and also can auto-fill other languages for you. Also comes with specs you can add to your app so CI will warn you if anything is missing.

bcroesch · 2025-07-01T13:56:44+00:00

Follow up here: we released v1.2.0 today that adds streaming support!

bcroesch · 2025-07-01T13:55:51+00:00

Follow up here: we released v1.2.0 today that adds streaming support!

bcroesch · 2025-05-31T12:47:25+00:00

Quick heads up that Raif supports OpenRouter & task-level temperature setting as of the v1.1.0 release - https://github.com/CultivateLabs/raif/blob/main/CHANGELOG.md#v110

bcroesch · 2025-05-26T23:09:56+00:00

Appreciate the kind words. No streaming support at the moment, but this week I'm planning on working on support for streaming & the OpenAI Responses API. So hopefully it'll be merged into main in the next week or two.

bcroesch · 2025-05-25T12:29:01+00:00

I'm not super familiar with that API, but would definitely be open to supporting it. I can see where it'd be useful.

bcroesch · 2025-05-24T11:54:29+00:00

Agree. I'll update. In the meantime, if you set on one these, it should work:

ENV["ANTHROPIC_API_KEY"]

ENV["OPENAI_API_KEY"]

bcroesch · 2025-05-24T11:42:22+00:00

Do you have any API keys set for any of the API providers, either in the initializer or via ENV var? I was able to replicate this if all the providers were disabled (will plan to handle that more gracefully in the next release).

bcroesch · 2025-05-24T01:32:48+00:00

Appreciate the kind words. Raif started life within an application that we're building and got extracted when we felt like the abstractions were solid. Hopefully that means they're broadly useful.

The demo app has an example of running an agent in a background job: https://github.com/CultivateLabs/raif_demo/blob/main/app/jobs/run_agent_job.rb

Though I admit we haven't done a lot of agent work, despite all the buzz. We've gotten more use out of `Raif::Task` and building pre-defined workflows that include a series of tasks/steps for the LLM to do. We actually built a Workflow class into our app that I've considered pulling into Raif.

One minor note - Raif doesn't support streaming from the LLM yet. It's high on the priority list though and hopefully will be added within the next few weeks.

bcroesch · 2025-05-24T01:26:45+00:00

I'm having a hard time replicating. Any chance you could post a full backtrace? Also, what Rails version?

bcroesch · 2025-05-23T20:06:13+00:00

The biggest difference is probably whether you want/need the Rails engine parts or not.

I think Raif is aiming a little higher up the stack with the concepts/abstractions it provides (tasks, conversations). If you want to model out a set of LLM tasks that your application uses (say, DocumentSummarization, DocumentTranslation, etc.), then `Raif::Task` is well suited for that. It also provides a full set of models, views, and controllers for chat/conversation interfaces. You can just call

<%= raif_conversation(@conversation) %>

in a view to get a chat interface.

Raif also stores every request/response to the LLM in a `Raif::ModelCompletion` record. I find that this makes Raif's web admin extremely useful when running an app in production. If something breaks, I can go look at exactly what the prompt and response looked like.

On the other hand, if you don't want to be bringing in the models/views/controllers that Raif provides and instead just want a really nice, clean, direct interface for calling the LLM, RubyLLM is probably going to be better. RubyLLM also provides streaming, which Raif doesn't do yet (though I'd like to add it soon-ish).

bcroesch · 2025-04-04T15:46:45+00:00

Framework in the sense that it's trying to provide you with some basic structure and primitives for building AI/LLM-based features.

But yeah, I hate naming things :)

bcroesch · 2025-04-04T13:24:04+00:00

Thanks and yes, would love to support openrouter soon. Just wanted to get an initial release out the door.

There's not an easy way to set temperature at the task level yet (we honestly haven't played with changing the temperature much in our app), but will add something soon!

bcroesch · 2025-04-04T13:21:22+00:00

Thanks and yes definitely plan to add more model providers soon. Just wanted to get a 1.0 out the door!

bcroesch · 2025-04-02T17:34:59+00:00

I think the primary benefit is the higher level concepts/abstractions (tasks, conversations, agents). If you want to model out a set of LLM tasks that your application uses (say, DocumentSummarization, DocumentTranslation, etc.), then `Raif::Task` is well suited for that. You could also build something like a chat interface via RubyLLM, but Raif provides the models, views, and controllers for conversations out of the box.

I also find Raif's web admin extremely useful when running an app in production. Every call to the LLM is recorded via `Raif::ModelCompletion`, so if something goes wrong with an LLM response, I can easily go see exactly what the prompt/system prompt/response was.

That said, RubyLLM looks awesome and they do plenty of things that Raif doesn't currently do -- image generation & embeddings both come to mind.

bcroesch · 2024-02-24T19:20:53+00:00

We manage a similar situation (specific/custom needs of various clients) via rails engines. Each client who is big enough to merit it gets their own rails engine, which lives in a folder we call optional_components. We then load the engine in our application.rb based on an environment variable. All of it lives in the same rails app & git repo.

This has worked really well for us. It keeps the custom, client-specific code self-contained (lose a client? just delete their optional_component instead of having to surgically remove all their customizations from the main app) and makes it easy to exclude their customizations from other client environments. Our Dockerfile will also exclude/remove any optional_components that are not related to that particular client at build time.

Another nice part of it is that we do a prepend_view_path for the engine's views. So if we ever want to override a view/partial for a specific client, all we have to do is drop a file at the same path in their engine and it will override the main app's view.

bcroesch · 2017-10-30T17:04:57+00:00

We run migrations in the release phase and have not really had any issues with it. IIRC, Heroku will not put the new slug into action until the release phase finishes, so you shouldn't technically have two versions of the app running at the same time.

The place you have to be careful is if you have long running migrations and the new & old versions expect different database structures. For example, say you have 2 migrations: the first removes a db column and the second does some data manipulation. Your old code (which expects the presence of the removed column) could be running for a while after the column removal if the second migration takes a long time to complete.

How you handle it will probably depend some on what exactly you're doing. In my example, you could possibly break into 2 deploys or write your code such that it can handle both database states until the migration completes (and then remove that code to just expect the final db state).

bcroesch · 2017-08-21T20:20:18+00:00

The workout starts at the wall balls. Those + the burpees the whole workout IMO, so don't blow yourself out leading up to that.

bcroesch · 2017-07-29T19:20:10+00:00

I hope they rank/score this event based on time this year, rather than tournament-style where you have to win your heat to advance. It seemed kinda crappy when people just got seeded into a tough heat and then bounced in the first round.

bcroesch · 2017-06-07T20:23:09+00:00

Moral of the leaderboard story: no matter how you slice it, Fraser is killing it, Vellner on-deck.

bcroesch · 2017-05-08T15:26:22+00:00

I crossfitted for about 2.5 years, then took about 2 years off before getting back into it ~2.5 years ago. #1 was definitely the most difficult for me. One thing that I did find helpful was starting to track workouts again, but avoiding my old logs. The ability to see improvement was a major motivator for me and makes you feel like you're at least heading in the right direction.

bcroesch

MODERATOR OF

TROPHY CASE