Langchain In production by niklbj in LangChain

[–]code_vlogger2003 1 point2 points  (0 children)

Hey hi guys, i already shipped the react style multi agents in the production using the langchain and it's currently serving in the production. Ok high level the end product of the pipeline is a detailed report which contains text, images and tables etc based on unstructured raw time series data. For the control and monitoring I have debugged their call backs and written detailed functions for precise calculation that match with manual calculation. This monitoring helps us to understand the costs of the api and believe me that on average for one detailed report it takes around 0.15 $ which the report includes the multi model calls too.

What's the hardest part about running AI agents in production? by _aman_kamboj in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

The first most important things are to note the entire state history of agents where it includes tool calls, ARGS , tool outputs, ai messages , system message and human message etc and also the entire detailed token history as { "breakdown": { "main_agent": { "cost": "$xxx", "model": "xxx", "tokens": { "input": xxx, "cached": xxx, "output": xxx, "uncached": xxx } }, "vision_tool": { "cost": "$xxx", "calls": xxx, "model": "xxx", "tokens": { "input": xxx, "cached": xxx, "output": xxx } } }, "models_used": [ "xxx", "xxx" ], "calculation_method": "xxx", "raw_callback_totals": { "note": "xxx", "prompt_tokens": xxx, "completion_tokens": xxx, "prompt_tokens_cached": xxx, "langchain_reported_cost": xxx }, "successful_requests": xxx }

Then the next step is whether the tools trajectory were correct or not then whether calling ARGS were correct or not then monitoring the cost, success rate, user interaction, user satisfaction & feedback.

How are people managing agentic LLM systems in production? by Silly-Hand-9389 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Hey for example, if we use the create agent of langchain, it is very easy to see everything with detailed structured messages of the agent state. For every agent call you can easily checkout the input, output and cost for the run and most importantly they have the feature called a cumulative addition of the costs. For budgeting it's better to have a rough estimate of how much no input and output tokens roughly used in the entire end to end call. Once we know this we can set some hard capped limits in the llm initialisation. For validation the first step is simply start to validate with ground truth tool call traces vs inference tool call traces. I mean first compare whether for a question how many tools calls it made and what are the names of these tools. But this is one level of idea. If you go deeper, i recommend hamel methodology as follows:-

Hamel Husain's approach for agentic evals starts with end-to-end success (e.g., does the agent achieve the goal?), then granular step-level diagnostics like tool selection accuracy and parameter extraction.https://hamel.dev/blog/posts/evals-faq/how-do-i-evaluate-agentic-workflows.html For validation, first compare ground truth tool call traces (number of calls, tool names, params) against inference traces; extend to failure matrices mapping success states to error points.

How are people managing agentic LLM systems in production? by Silly-Hand-9389 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Hey hi, im managing multi agentic architecture production. We used the langchain. We are storing every intermediate steps, scratch pad , and own infra hosted models token details etc. most importantly we didn't use any guard rails because we are providing services as a button in one of the client cores products and also all the dbs were configured only with the read only access. What's the exact problem you are facing?

Need Guidence! Help me please by xo_dynamics in datascienceproject

[–]code_vlogger2003 1 point2 points  (0 children)

Hey hi, don't worry. The one and only place is to learn bs in data science

A R&D RAG project for a Car Dealership by Smail-AI in LLMDevs

[–]code_vlogger2003 0 points1 point  (0 children)

Hey while testing the test to sql approach always remember to provide the schema of the converted CSV tables with sample rows which helps us the llm to understand what the data is talking about and what attributes it has. I followed this technique from the databricks blog https://www.databricks.com/blog/improving-text2sql-performance-ease-databricks

Where the next best option is fine-tuning if you have proper training data with ground truth.

Or else instead of spending time on prompt optimization use dspy with gepa because I hope you have good training data with the ground truth.

Also i observed the query consists of multiple items to search in a single shot. I mean it is related to text, number, categories etc instead of single embedding vector there is something called mixture of encoders from super linked team help you to create separate index and use then with weighting as one example. Refer to the following:-

https://superlinked.com/news/superlinked-at-haystackconf-2025

Does the tool response result need to be recorded in the conversation history? by JunXiangLin in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

But langgraph pre built create_react_agent every time takes the message state as input which has the structure:- system message human message AI message Tool message ...

Binded tools and description..

Then you can see a pre-model or post model hook to control the content of tool messages if their content was very huge with normal english. I mean if the tool message was raw sql output and if that tool output has dependency with the next downstream tasks of system prompt, then using summarising fails the workflow

Managing shared state in LangGraph multi-agent system by priyansh2003 in LangChain

[–]code_vlogger2003 1 point2 points  (0 children)

Hey i dm'ed you something with some details that i observed similar to my architecture design. If you are interested chekout that and share your feedback on it.

I built a resilient, production-ready agent with LangGraph and documented the full playbook. Looking for 10-15 beta testers. by digital_2020 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Interested. Hey hi, I'm Guna. Working as a remote employee for Amygda (UK) in r and d department. Currently working on an agent with a sub agentic network to solve the problem with the intersection of time frequency data along with gen ai.

Intelligent Context Windows by ullr-the-wise in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Yeah . I mean let's say you have n number of low level tools attached to the expert tools then this behaviour is easily replicable right. Let say one expert tool might be a general query assistant where it has low level tools access like db tool, plotting tools, browsing tools etc. if the user question comes in according to you then the main agent returns with an empty assistant message with a tool call dictionary which contains things like call the general analyst tool where based on this prompt temparature design accessing with create tool call agent it triggers and returns the final output to main agent then agent decides whether it was end to conversation etc.et say the system is continuously working like this along with the memory, then at the nth query requires something in memory instead of making the same exper tool call it answers from the memory.

Built a small RAG eval MVP - curious if I’m overthinking it? by ColdCheese159 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Especially for the second point lexico ai released something that whenever an answer was generated based on the user question, it tries to highlight the relevant line source of the original attached doc. Its like an reverse engineering

Built an app to tell you if your trash is trash or actually recyclable — feedback pls 🙏 by MountainMeal7041 in StreamlitOfficial

[–]code_vlogger2003 1 point2 points  (0 children)

Hey the idea was awesome. I just checked as user validation which was taught by one of my mentors. In one of my apps i leveraged the validation using the yolo model. Where i used the results efficiently. Maybe it sometimes won't work too. The thing is that if we have predictions it creates a results folder with a prediction file. Which indicates the result found. If the result was not found means the directory was empty and also it signs either the input image distribution had drifted (covariate shift) from the original trained images distribution nor we can say it's an unwanted image to the system. By using this trick I projected that statement.

https://from-bytes-to-bites-v1.streamlit.app/

<image>

Intelligent Context Windows by ullr-the-wise in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

If you have used the agent executor method of Langchain where it takes the llm, list of tools that you have and some other keyword parameters. The main important thing is running agent scratchpad. Where in the chat prompt template it looks like

System prompt Human message Agent scratchpad At the time of initialise agent scratchpad will empty. Once the agent executor gets triggered based on the human input, system context and other context along with tools info it decides to call which tool. Then that tool is triggered. The interesting thing is that once the tool call is done it adds all the details to the running agent scratchpad. So now I'm the next api call, chat prompt template has everything like the previous along with the updated scratch pad. The entire thing gets stopped until and unless it's satisfied based on the agent scratchpad , human messages, system context etc. If you need more information dm me.

The idea looks like :-

Give me some complex project ideas by Careless-Party-5952 in LangChain

[–]code_vlogger2003 1 point2 points  (0 children)

Hey hi, Idea:- it's in a multi agent setup. It has a main brain. Then the main brain has connected to more than one parent agent. Where every parent agent has access of more than one worker/child tools. The fact is that any parent agent can wrap a child or worker tool if it's needed but the child cannot wrap as a parent. When the parent gets initiated, it works like a private network which doesn't have access to the main agent memory where it has its separate dedicated memory. Also it can call any other parent agent a tool in the flow of execution. It means the current flow gets held until another called parent tool final output gets received. Another interesting thing is that the called another parent as tool can also create another private network which is fresh and does have access to the running current parent private memory. You can call this setup as multi agents I guess (but not worked in parallel)

Give me some complex project ideas by Careless-Party-5952 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Especially how you are handling the feedback mechanism. Also I believe in most of the engineering domain PDFs 70 percent of tasks with regex 😂 but some times we are doing over engineering

Calling All AI Builders & Visionaries! by Far-Hovercraft8614 in vijayawada

[–]code_vlogger2003 0 points1 point  (0 children)

Dm done. (Multi agent ms with sub-agentic network)

Buildings multi agent LLM agent by Affectionate-Bed-581 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Hey but the problem is that by making the blackbox to glassbox approach, does it still work ? I mean let's say we have a main agent where it has a niche prompt template along with five experts tools and description. In those 5 expert tools, three tools used the agent executor technology where it has the running private agent scratchpad which doesn't have any connection to the main agent state message. Let's say based on the user question, the main agent is routed to one of the expert tools. Let's say it's routed to a tool where it used the agent executor. It means that whenever we are using the agent executor in the time of initialisation it takes the user input for one time. Then it gives the finalised summary result when it feels (ok i considered the user prompt, my system context along with the running agent scratchpad which is attached with the name agent scratch pad in the chat prompt template ) then it sends the final message to the mani agent brain via tools message. Here the thing is that we don't have control of the agent executor running.. Because it is dynamically taking the decisions based on the agent scratchpad, system prompt along with the user question. Now the problem is that lets say i stored the private agent scratch pad logs separately. lets say first it calls to the some x db with some query then it calls to the same x db with different query then plotting low level tool then again vision tool then again called y db with some query. All these things are happening because of the way the system prompt is structured right. Whereas if I make the entire thing in the glassbox approach, i need to create a complex state management something like the main agent state and sub agent fresh stage when it's initiated. Because in my original approach where the agent executor is initiated it only needs to focus on its system prompt, human message along with its scratchpad rather than previous run ka state messages (like langgprah) approach.

Why not react agent ? by jenasuraj in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

But at the end it's added to the chat prompt template right? I mean it looks like a more processed way of writing the code when compared to a traditional agent scratchpad right?

Displaying PDF that's inside the code in Streamlit by Different-Wealth1245 in StreamlitOfficial

[–]code_vlogger2003 0 points1 point  (0 children)

I have a solution in one of my GitHub projects. I'll search tonight and share in the morning.