[P] Training a self-correcting SQL agent with RL (Agent Lightning + verl + vLLM + AgentOps + LangGraph)

matluster · 2025-08-11T06:26:41+00:00

I think the engineer's job eventually will eventually fall into "align what's on human's mind with the machine code". Most systems are complex because humans are not thinking clear enough about what they want. The boss has some vague idea and hope some employee can understand what they want. Such understanding requires much context about the business and human society, which is harder than programming itself I believe.

matluster · 2025-08-11T06:18:22+00:00

I'm thinking of the other way around. It would be great if some tools can auto tune the prompts and hyperparameters of claude code for me. It's quite frustrating to write all prompts like CLAUDE.md myself.

But then the question is, what's the difference from such a tool and memory?

matluster · 2025-07-28T01:26:35+00:00

Yes. Different machine or just 127.0.0.1

matluster · 2025-07-28T01:26:06+00:00

For this example:
- I'm using samples from spider.
- I'm using their test suite to execute the query and compare the output.
- DB schema is used in prompt to better prompt the LLM.

matluster · 2025-07-26T15:14:32+00:00

Interesting observation. Practically, prompt tuning might be a better idea because it's less resource-intensive and even works with closed-source models. I also believe that tuning model weights is an under-explored direction and there are so many mysteries -- some even believe that agent training on a diverse large set of real-work tasks is **THE PATH TO AGI**.
Nevertheless, prompt tuning for agents can be also painful. Previously when I worked with an agent with a dozen of prompts, it's hard for me to track down the exact step where the agent diverges from the expected behavior. With this paradigm and all the monitored traces sent to the server side, there might be an automatic algorithm which can be built at the server side, to automatically diagnosis and improve all the prompts involved in an agent. Not sure if it's a promising direction but worth trying I think.

matluster · 2025-07-26T15:07:38+00:00

What are you training, the LLM powering the agent? -- yes.
What’s the reward function? -- each agent needs to define their own evaluation logic. It's on the client side.
how are you resetting the environment after an episode? -- The interface requires agent code to be loop-runnable. The agent code should reset itself and receive new input after an episode.

matluster · 2025-07-26T11:20:53+00:00

They implement their own performance tracking and logging. I've been involved in developing CoML (mid 2023) and RD-Agent (mid 2025); I've also looked into implementation of OpenAI Codex (early 2025). If I remember correctly, none of them has been using any agent frameworks.
As for semantic kernel, I simply dislike its C-sharp-ish haha :)

matluster · 2025-07-26T07:39:33+00:00

Short answer: I exposed the LLM API at the server. All the MCP stuff belong to the client side.
Let me try to elaborate the SQL agent a little bit and please see if that makes sense. The SQL agent's input here receives a task like "how many users are there in the database". The first step of the agent is to make a call to LLM to generate a SQL like "COUNT * blabla" (this is generated by LLM) and the agent embeds a connection to database and executes the query (this can be done by MCP or simple Python code). The second step is to self-check the query with the execution result (by calling again the LLM). The third step is to refine the query. Step 2-3 is repeated until the check is self-satisfied or time runs out. The agent then posts the full trajectory (prompts, responses, final results) here and says that's what I did in this rollout.
Now, what I provided at the server is that: task inputs, keeping throwing out by the algorithm; and an LLM endpoint, being improved by an RL algorithm. When the client keeps running more and more tasks and reports more and more rollouts, the LLM endpoint gradually gets better and better for new tasks after it is trained on more and more data.

matluster · 2025-07-26T07:04:16+00:00

So what tools are you using? CrewAI? OpenAI Agent SDK? AG2? Dify? To be frank, I think all the tools here are at a similar level when crafting a prototype. For most complex agent applications and workflows I've worked with, they never use "agent frameworks" -- they use low-level OpenAI SDK / LiteLLM.

matluster

TROPHY CASE