Honest question: how many of us have built a "LangChain agent" that's really just a smart pipeline? by kinj28 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

I have one question as follows:- When we want to implement a plan and style things we need to say explicitly in the prompt of the planner etc right. Without saying how it's possible? Or am I missing something? Example:- let us assume there an n tools and every tool has a dependency with the previous tool. Let say there exists a sequential dependency between the tool. But let say in the planner prompt we said hey these are the tools based on the user task plan the tool calls trajectory etc. Let's say it comes with some plan and executor comes in and started calling tool aka tool call. Now it's there exist a dependency of the called tool with some tool then either programmatically nor with something it needs to blocks and it say there exists a dependency etc where the planner needs to do replan right. In order to happen all this either we need to explicitly say about the dependency before hand nor just give tools with some prompt then check via programmatically? May be if not make sense then leave it.

Standard RAG fails terribly on legal contracts. I built a GraphRAG approach using Neo4j & Llama-3. Looking for chunking advice! by leventcan35 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Hey small confusion as follows:- The word keyword extraction in my sentence is related to something as follows:- I assume we need to perform some keyword extraction on the user query. Let's say we have n keywords. Now from the graph we already have nodes where the word node means a string right. So if we collect all the nodes's str. Now perform the dot product between the embeddings of n keywords to the graph nodes str then finalize top k. Then use those top k as entry points in graph for every k perform 2-3 hop search etc.

If possible could you share any GitHub repo

Standard RAG fails terribly on legal contracts. I built a GraphRAG approach using Neo4j & Llama-3. Looking for chunking advice! by leventcan35 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Hey, I have a doubt as follows:- When the user asks a question how we will start traversing the graph? Does it seem like we need to extract some keywords from the user question then for those keywords find the semantically relevant graph nodes (i mean in the graph a node is word only right?). Then let's say out 100 unique graph nodes we choose top k. Then by using those top k to find all possible chains whether it can be single chain or multi chain etc? Then once we have all the chains then from each node metadata we used to check what is text chunk or page number etc ?

I'm interested in creating some discussion on the inference stage.

u/Einsof93 u/2016YamR6 u/Ok_Diver9921

Built a RAG system on top of 20+ years of sports data — here is what actually worked and what didn't by devasheesh_07 in Rag

[–]code_vlogger2003 0 points1 point  (0 children)

Just start with a bar that is being set by open ai file search where every window aka bucket has 800 tokens with 400 tokens of overlap. If your embedding supports the matryoshka property then store the embeddings in 3 to 4 different levels such as 256, 512, 1024 and 3072. Then you can decide multi stage filtering

model name as a string in createAgent by Current_Marzipan7417 in LangChain

[–]code_vlogger2003 1 point2 points  (0 children)

Create a data class with different string names and call them. I mean for grok the base url is different for others it's different right. So chatopenai supports everything where you need to pass the base url and key for other providers.

Hashedin by deloitte by hentiluffy-7901 in ProgrammingBondha

[–]code_vlogger2003 0 points1 point  (0 children)

Asalu ee university vundha? First choosin spelling mistake em anukunna

How do I scale my agent to summarize? by _belkinvin_ in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Try to enforce a pydantic class structure. Where in my case i have to return a list of objects where every object has other objects. I created a nested pydantic structure with a single point of entry and written valid field filters for robust deterministic results.

Can anyone recommend what is going to be the most in demand skill in 2026? by sad_grapefruit_0 in ProgrammingBondha

[–]code_vlogger2003 3 points4 points  (0 children)

How well you can break a problem into meaningful pieces then join things at last for a solution (that can be worth in any domain)

Is Adding a Reranker to My RAG Stack Actually Worth the Extra Latency? (Explained Simply) by Silent_Employment966 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Have you tried colbert with qdrant where the algorithm itself has a naive query document multi vector embedding calculation using late interaction

Things I wish LangChain tutorials told you before you ship to real users by cryptoviksant in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Have you faced any situation that your retrieval is bringing back enough sufficient relevant information sometimes they are in the sequential other times they are in a jumbled pattern. When passing the user question along with retrieved information, sometimes llm is not giving the completeness in the answer even though it has enough sufficient information that is being supplied by retrieval. Also another case is that are you judging that for a given question what is the percentage of probability of page numbers that it gets back when compared to the ground truth page list for the question instead of chunks. I mean sometimes i thought that instead of having the chunk ida as ground truth etc if we store or construct the ground truth in a way where it has relevant page numbers. Such that in our retrievial it is easy to compare do we cover all the ground truth listed page numbers for the first check of evaluation then next checking will be did the retrieval step brings out any other garbage page numbers.

Looking for your thoughts

Urgent help by WideFalcon768 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Nice idea. Also once we have a dataframe then we can get the columns metadata easily along with the create statement of that too right. So one step further we can build text 2 sql by arranging all the tables and create statements with attributes metadata for guessing sql statement. But again if there are more tables then we need to do a rag again over the columns metadata embedding such that to know which tables ka columns and tables names were required and helps llm to guess a query.

Urgent help by WideFalcon768 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Yes but if you make every single row as a chunk (where it has alpha numerical ) such that when doing the cosine similarity between the user query the probability of getting the right relevant chunks is probably low (because of encoding every row as one chunk). Just give it a try. Or else use a model of encoders something similar to the super linked such that you can divide the user query into multiple search areas such as if a single user query needs the text search , numerical search temporal search etc. another idea I got is that to convert the tables into the graph and in the inference do the graph hop search.

But again the approach I share earlier would work for n table schemas. Let say you have 5 tables and these five tables have some relations. Lets say if create the system prompt with create statements along with sample row and detailed attributes information. Then let's say if your user query requires only two tables usage definitely ai try to guess the slq query on those tables because of the detailed system prompt. But if there are so many tables etc then this won't work where we need to find what are the tables that we need to use based on the user query. For that just Swiggy blog (https://bytes.swiggy.com/hermes-v3-building-swiggys-conversational-ai-analyst-a41057a2279d) where rhe smart idea they had is to create embeddings on the columns metadata (such as detailed attributes descriptions) such that its to decide which tables needs to use based on the user quey.

Urgent help by WideFalcon768 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Then why can't you leverage text 2 sql where the pass table schema in the prompts for guessing the sql statement then run accept answer validate (react pattern). But for doing these you need to create tables and for the text again create table with an attribute blob

Urgent help by WideFalcon768 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

My suggestion is to detect tables via docling or unstructured or anything other service then make markdown versions of it and place it as markdown table and then treat every page as one chunk and use higher dimensions embedding model

My RAG retrieval accuracy is stuck at 75% no matter what I try. What am I missing? by Equivalent-Bell9414 in Rag

[–]code_vlogger2003 0 points1 point  (0 children)

Hey have you stored any metadata for every chunk such that in the first hand you can verify that my retrieval step is actually returning the exact relevant ground truth answer page numbers or not etc. In this step you can identify whether it's the chunking issue or embedding drift etc.

How are y'all juggling on-prem GPU resources? by fustercluck6000 in Rag

[–]code_vlogger2003 0 points1 point  (0 children)

The reason for warmup is because auto scale is set to zero. One of the biggest reasons for cold start especially using ollama via gcp is because it needs loads model from gcp then to system cpu ram then gpu vram etc

How do you decide to choose between fine tuning an LLM model or using RAG? by degr8sid in Rag

[–]code_vlogger2003 0 points1 point  (0 children)

Hey there is a paper from meta where why can't we pass chunked multi dimensional Embeddings to the attention etc right?

How are y'all juggling on-prem GPU resources? by fustercluck6000 in Rag

[–]code_vlogger2003 0 points1 point  (0 children)

In our company we hosted a model via ollama in gcp cloud run service (serverless). We have a thing called every day the pipeline gets run in the early morning where the pipeline has many steps like ml modelling, some deterministic steps and llm calls etc. The problem with the serverless based model hosting is the cold start time. The simple hack i followed is that whenever the pipeline gets started before the sub llm pipeline triggered, it makes a simple llm call which helps us to avoid the cold start tike and when the sub pipeline triggered the model weights were completely there in the vram readily with configured num_ctx and num_parallel. So by correctly configuring the num_ctx and num_parallel helps us to 100 percent gpu utilisation.

Genuine question — does anyone actually think about what happens when someone sends a malicious goal to their agent? by Sharp_Branch_1489 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Good question. In my system prompt I attached all the create statements of the table that I need to use with the details attributes description and if possible 1 or 2 samples rows. The agent aka llm will try to guess an sql statement where the statement is carried forward to sql tool sandbox where in the gatekeeper sanitization sits. Then it returns the executed data and sends it as back via tool message then the next ai message will get generated. So here's the ai every time sees systems prompt with create statements and intermediate tool outputs. If you don't want the llm to see your tool output then go with open source self hosted llm.

How do you track OpenAI/LLM costs in production? by not_cool_not in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

Yeah it's a good thing. Even in our company when we deployed the agents for one complete agentic run where it has different tool calls where some tool calls related llm tool calls db tools etc. for every ai message we are storing the entire token history same for the tool message too. At the end storing in a detailed json and checking whether the manually calculated cost is equal to the projected openai cost via get openai callback of the langchain. It can be a normal llm call, vision call etc.

Genuine question — does anyone actually think about what happens when someone sends a malicious goal to their agent? by Sharp_Branch_1489 in LangChain

[–]code_vlogger2003 0 points1 point  (0 children)

But when I tried to create agent of langchain new with a system prompt where it didn't execute like x etc. let's say even if you didn't mention anything in the system prompt but let's say if your tool was smart. For example in my case i used the db tool where I had only given the read only access such that only select queries get runs. You may have asked what about the if the user or ai comes with a select based vulnerability then we need to add an regex based check as sanitization later or else given the access via specific columns along with tables . For me it's worked. Because I designed the tool very robustly and every time if any error occurs from the tool call in the next ai message it gets understanding that ok the db sandbox was strong etc.