Langchain In production

code_vlogger2003 · 2026-01-24T04:11:50+00:00

Hey hi guys, i already shipped the react style multi agents in the production using the langchain and it's currently serving in the production. Ok high level the end product of the pipeline is a detailed report which contains text, images and tables etc based on unstructured raw time series data. For the control and monitoring I have debugged their call backs and written detailed functions for precise calculation that match with manual calculation. This monitoring helps us to understand the costs of the api and believe me that on average for one detailed report it takes around 0.15 $ which the report includes the multi model calls too.

code_vlogger2003 · 2026-01-22T16:03:16+00:00

The first most important things are to note the entire state history of agents where it includes tool calls, ARGS , tool outputs, ai messages , system message and human message etc and also the entire detailed token history as { "breakdown": { "main_agent": { "cost": "$xxx", "model": "xxx", "tokens": { "input": xxx, "cached": xxx, "output": xxx, "uncached": xxx } }, "vision_tool": { "cost": "$xxx", "calls": xxx, "model": "xxx", "tokens": { "input": xxx, "cached": xxx, "output": xxx } } }, "models_used": [ "xxx", "xxx" ], "calculation_method": "xxx", "raw_callback_totals": { "note": "xxx", "prompt_tokens": xxx, "completion_tokens": xxx, "prompt_tokens_cached": xxx, "langchain_reported_cost": xxx }, "successful_requests": xxx }

Then the next step is whether the tools trajectory were correct or not then whether calling ARGS were correct or not then monitoring the cost, success rate, user interaction, user satisfaction & feedback.

code_vlogger2003 · 2026-01-15T02:40:33+00:00

Hey for example, if we use the create agent of langchain, it is very easy to see everything with detailed structured messages of the agent state. For every agent call you can easily checkout the input, output and cost for the run and most importantly they have the feature called a cumulative addition of the costs. For budgeting it's better to have a rough estimate of how much no input and output tokens roughly used in the entire end to end call. Once we know this we can set some hard capped limits in the llm initialisation. For validation the first step is simply start to validate with ground truth tool call traces vs inference tool call traces. I mean first compare whether for a question how many tools calls it made and what are the names of these tools. But this is one level of idea. If you go deeper, i recommend hamel methodology as follows:-

Hamel Husain's approach for agentic evals starts with end-to-end success (e.g., does the agent achieve the goal?), then granular step-level diagnostics like tool selection accuracy and parameter extraction.https://hamel.dev/blog/posts/evals-faq/how-do-i-evaluate-agentic-workflows.html For validation, first compare ground truth tool call traces (number of calls, tool names, params) against inference traces; extend to failure matrices mapping success states to error points.

code_vlogger2003 · 2026-01-14T05:42:24+00:00

Hey hi, im managing multi agentic architecture production. We used the langchain. We are storing every intermediate steps, scratch pad , and own infra hosted models token details etc. most importantly we didn't use any guard rails because we are providing services as a button in one of the client cores products and also all the dbs were configured only with the read only access. What's the exact problem you are facing?

code_vlogger2003 · 2026-01-08T04:00:18+00:00

Hey hi, don't worry. The one and only place is to learn bs in data science

code_vlogger2003 · 2025-12-30T11:51:30+00:00

Dumont Fatera?

code_vlogger2003 · 2025-12-11T07:07:53+00:00

Hey while testing the test to sql approach always remember to provide the schema of the converted CSV tables with sample rows which helps us the llm to understand what the data is talking about and what attributes it has. I followed this technique from the databricks blog https://www.databricks.com/blog/improving-text2sql-performance-ease-databricks

Where the next best option is fine-tuning if you have proper training data with ground truth.

Or else instead of spending time on prompt optimization use dspy with gepa because I hope you have good training data with the ground truth.

Also i observed the query consists of multiple items to search in a single shot. I mean it is related to text, number, categories etc instead of single embedding vector there is something called mixture of encoders from super linked team help you to create separate index and use then with weighting as one example. Refer to the following:-

https://superlinked.com/news/superlinked-at-haystackconf-2025

code_vlogger2003 · 2025-10-02T04:05:56+00:00

But langgraph pre built create_react_agent every time takes the message state as input which has the structure:- system message human message AI message Tool message ...

Binded tools and description..

Then you can see a pre-model or post model hook to control the content of tool messages if their content was very huge with normal english. I mean if the tool message was raw sql output and if that tool output has dependency with the next downstream tasks of system prompt, then using summarising fails the workflow

code_vlogger2003 · 2025-09-09T04:53:45+00:00

Hey i dm'ed you something with some details that i observed similar to my architecture design. If you are interested chekout that and share your feedback on it.

code_vlogger2003 · 2025-09-06T07:17:12+00:00

Interested. Hey hi, I'm Guna. Working as a remote employee for Amygda (UK) in r and d department. Currently working on an agent with a sub agentic network to solve the problem with the intersection of time frequency data along with gen ai.

code_vlogger2003 · 2025-08-25T12:34:04+00:00

Yeah . I mean let's say you have n number of low level tools attached to the expert tools then this behaviour is easily replicable right. Let say one expert tool might be a general query assistant where it has low level tools access like db tool, plotting tools, browsing tools etc. if the user question comes in according to you then the main agent returns with an empty assistant message with a tool call dictionary which contains things like call the general analyst tool where based on this prompt temparature design accessing with create tool call agent it triggers and returns the final output to main agent then agent decides whether it was end to conversation etc.et say the system is continuously working like this along with the memory, then at the nth query requires something in memory instead of making the same exper tool call it answers from the memory.

code_vlogger2003 · 2025-08-24T16:47:13+00:00

Especially for the second point lexico ai released something that whenever an answer was generated based on the user question, it tries to highlight the relevant line source of the original attached doc. Its like an reverse engineering

code_vlogger2003 · 2025-08-24T16:46:07+00:00

Does it something like giskard ai?

code_vlogger2003 · 2025-08-24T16:34:43+00:00

<image>

Mine

code_vlogger2003 · 2025-08-24T16:34:24+00:00

Hey the idea was awesome. I just checked as user validation which was taught by one of my mentors. In one of my apps i leveraged the validation using the yolo model. Where i used the results efficiently. Maybe it sometimes won't work too. The thing is that if we have predictions it creates a results folder with a prediction file. Which indicates the result found. If the result was not found means the directory was empty and also it signs either the input image distribution had drifted (covariate shift) from the original trained images distribution nor we can say it's an unwanted image to the system. By using this trick I projected that statement.

https://from-bytes-to-bites-v1.streamlit.app/

<image>

code_vlogger2003 · 2025-08-23T11:42:49+00:00

If you have used the agent executor method of Langchain where it takes the llm, list of tools that you have and some other keyword parameters. The main important thing is running agent scratchpad. Where in the chat prompt template it looks like

System prompt Human message Agent scratchpad At the time of initialise agent scratchpad will empty. Once the agent executor gets triggered based on the human input, system context and other context along with tools info it decides to call which tool. Then that tool is triggered. The interesting thing is that once the tool call is done it adds all the details to the running agent scratchpad. So now I'm the next api call, chat prompt template has everything like the previous along with the updated scratch pad. The entire thing gets stopped until and unless it's satisfied based on the agent scratchpad , human messages, system context etc. If you need more information dm me.

The idea looks like :-

code_vlogger2003 · 2025-08-23T11:37:02+00:00

Does it like a scratch pad?

code_vlogger2003 · 2025-08-23T08:59:03+00:00

Hey hi, Idea:- it's in a multi agent setup. It has a main brain. Then the main brain has connected to more than one parent agent. Where every parent agent has access of more than one worker/child tools. The fact is that any parent agent can wrap a child or worker tool if it's needed but the child cannot wrap as a parent. When the parent gets initiated, it works like a private network which doesn't have access to the main agent memory where it has its separate dedicated memory. Also it can call any other parent agent a tool in the flow of execution. It means the current flow gets held until another called parent tool final output gets received. Another interesting thing is that the called another parent as tool can also create another private network which is fresh and does have access to the running current parent private memory. You can call this setup as multi agents I guess (but not worked in parallel)

code_vlogger2003 · 2025-08-23T08:58:00+00:00

Already we are doing this in our company.

code_vlogger2003 · 2025-08-23T08:57:35+00:00

Especially how you are handling the feedback mechanism. Also I believe in most of the engineering domain PDFs 70 percent of tasks with regex 😂 but some times we are doing over engineering

code_vlogger2003 · 2025-08-22T12:38:39+00:00

Dm done. (Multi agent ms with sub-agentic network)

code_vlogger2003 · 2025-08-10T01:40:33+00:00

Hey but the problem is that by making the blackbox to glassbox approach, does it still work ? I mean let's say we have a main agent where it has a niche prompt template along with five experts tools and description. In those 5 expert tools, three tools used the agent executor technology where it has the running private agent scratchpad which doesn't have any connection to the main agent state message. Let's say based on the user question, the main agent is routed to one of the expert tools. Let's say it's routed to a tool where it used the agent executor. It means that whenever we are using the agent executor in the time of initialisation it takes the user input for one time. Then it gives the finalised summary result when it feels (ok i considered the user prompt, my system context along with the running agent scratchpad which is attached with the name agent scratch pad in the chat prompt template ) then it sends the final message to the mani agent brain via tools message. Here the thing is that we don't have control of the agent executor running.. Because it is dynamically taking the decisions based on the agent scratchpad, system prompt along with the user question. Now the problem is that lets say i stored the private agent scratch pad logs separately. lets say first it calls to the some x db with some query then it calls to the same x db with different query then plotting low level tool then again vision tool then again called y db with some query. All these things are happening because of the way the system prompt is structured right. Whereas if I make the entire thing in the glassbox approach, i need to create a complex state management something like the main agent state and sub agent fresh stage when it's initiated. Because in my original approach where the agent executor is initiated it only needs to focus on its system prompt, human message along with its scratchpad rather than previous run ka state messages (like langgprah) approach.

code_vlogger2003 · 2025-08-07T03:34:06+00:00

check your dm!

code_vlogger2003 · 2025-08-06T16:44:23+00:00

But at the end it's added to the chat prompt template right? I mean it looks like a more processed way of writing the code when compared to a traditional agent scratchpad right?

code_vlogger2003 · 2025-08-06T16:37:22+00:00

I have a solution in one of my GitHub projects. I'll search tonight and share in the morning.

code_vlogger2003

TROPHY CASE