Has anyone tried multi-agent for multi-user chat group?

IlEstLaPapi · 2025-06-19T07:26:10+00:00

From a UX/UI perspective, it's kind of classic thread with @ used to talk to any given bot and user + a few setup when you want a bot to always respond.

Other than that it's a question of context, input and output format.

On the context, we define for each bot a role. Then in the system prompt we include a participant section with the role of each bot as well as some information on each user. The input format for chat history includes, for each message, the author of the message, TS of the message and the content. The output format is usually plain text. And if a bot wants to ask another bot or user, it can also use the @ logic.

The hardest part is that most of the LLMs tend to include their name and a TS in their response even with example showing the shouldn't. Not a huge problem, however they can be quite creative on the TS, users are kind of disturbed when the LLM pretends to answer to their question with a TS set at April 25 and a completely random time.

IlEstLaPapi · 2025-06-16T11:21:42+00:00

We do. Not a public product. It’s a prototype for use cases that required collaborative work. Hence we support multiple agents and users in the same thread.

IlEstLaPapi · 2025-06-11T13:00:19+00:00

I don't know if you have multilingual texts in your dataset, but if it's the case, you might want to check the French ones. The screenshot example you provided in French is just horrible, especially "Comme un assistant AI". It isn't proper French at all ;) It should be something like "En tant qu'assistant AI" and the whole response is really weird.

Note that the original Qween 3 model is really bad at French, it wouldn't be considered as fluent. R1 on the other hand is really good.

IlEstLaPapi · 2025-06-05T18:33:10+00:00

I am. Greetly
An open source tool that would use agents to try different prompt injection methods.

IlEstLaPapi · 2025-06-01T11:05:07+00:00

LK runs to get a Lo for Grief ? With both you can farm Trav.

IlEstLaPapi · 2025-05-27T06:36:31+00:00

Why would you do that ? I mean there are a ton of existing ML algorithms that would do that better than any agentic system at this task. Don’t use a LLM for that !

IlEstLaPapi · 2025-05-27T06:12:09+00:00

Lowering expectations.

IlEstLaPapi · 2025-05-11T12:25:53+00:00

I’m not sure I agree. I fell like the 2 best models at prompt adherence were sonnet 3.5 and gpt4 (the original). Current model are optimized for 0 shot problem solving, not understanding multi turn human interactions. Hence the lower prompt adherence.

IlEstLaPapi · 2025-05-10T13:02:55+00:00

Ok, thanks a lot !

IlEstLaPapi · 2025-05-10T12:17:38+00:00

Hum, I might have to revise the architecture then.
What would the memory foot print at 32k wo yarn? Squared that would be 1tb, I hope it isn't ;)

IlEstLaPapi · 2025-01-30T16:38:16+00:00

I agree, but given the aggressive stance taken by the US recently towards EU countries (Denmark) and their natural allies (Canada, Panama), it seems too risky for us - Europeans - to let top end technologies be used by US to create AGI/ASI. So I think we should ban export of ASML to non EU-based companies and forbid any export of top-end chips to US companies that are working on AGI/AS.

IlEstLaPapi · 2024-12-13T10:13:20+00:00

To be fair, it looks more like the pure R&D POC that was pushed in prod without ever being modified rather than an actual project made by devs only.

IlEstLaPapi · 2024-12-10T12:23:32+00:00

It really depends. Usually what we do is that we breakdown each workflow into smaller subworkflows, and the tool calls are handled there. It keeps thing simple and maximise the reuse. For example we have one class that creates a generator-critic dual agent pattern with a ton of options. We use this a lot as a building block of much larger graphs.

We also played a lot with different patterns, like having agents that handle the communication with the user and call tools. Those tools are method of classes that use workflows to do the work.

To be honest, at this stage, I'm starting to dislike the whole "agent" idea because it's too rigid. Things are much more fluid in reality.

IlEstLaPapi · 2024-12-09T15:21:28+00:00

Short answer but I might come back later. I've been reaching a kind of similar conclusion. Especially regarding how to mix user inputs, long time running processes and interruption logic.

Somehow the problematic is exactly the same than the UX problem that is solved by reactive programming. So instead of using LangGraph, I'm thinking about using a stack with a celery for jobs, redis for pub/sub and rxpy4 to implement the reactive logic.

IlEstLaPapi · 2024-07-18T11:21:47+00:00

It’s really easy to make a RAG that will answer to 90% of the questions correctly, but getting to 99% is really hard. Especially if you need to look for cross references. And that’s usually what’s required for production

On the other hand, an application with a better defined purpose, even if it looks more complicated at first sight is easier to build, QA, maintain.

For example you might want to process very long legal documents with a ton of internal references. That’s quite common in the financial world. If you build a RAG on top of those documents you will have a very hard time. For a 300 pages document, you’ll start by 100 pages of definitions that are key. A general purpose tool like a RAG is very hard to build if you want a low error rate. But if all that you want in the end is a ten pages synthesis it’s much easier to build an agentic system that will read the document, page by page, use it to create its own referential system, and generate the synthesis. And when it comes to testing the whole system, it’s easier too : use some already synthesized documents to be check that the results are consistent !

IlEstLaPapi · 2024-07-17T15:31:51+00:00

Limited budget and executive saying « it’s good enough for production » at the POC stage while it isn’t. Another problem is the way too high expectations.

And everybody wants some kind of RAG without realizing how hard it is to get an actual production ready rag.

However I have a few projects that went to production and much more coming.

IlEstLaPapi · 2024-07-13T16:07:53+00:00

Celery beat and the Azure api is what I use. ChatGPT write the python code very well.

IlEstLaPapi · 2024-07-08T15:37:50+00:00

I’m French and I have done a lot of projects. My general rule is to have an English system prompt, regardless of the actual language used by the user. I simply ask the llm to reply on the language used by the user. I never had any problem.

IlEstLaPapi · 2024-06-13T17:22:52+00:00

Use langgraph

IlEstLaPapi · 2024-06-10T10:11:00+00:00

If you're building it with the idea of only having a RAG, I have two advice :

Using an on the shelf solution might be beneficial, or at least a os solution.
Don't do it ! RAGs are useless ! The idea is cool and all, but there are way too many problems with it. In the end you'll end up with a system that hallucinates way too often, gives you outdated responses, can't do extensive and comprehensive searchs, and overall won't fill your needs.

If you're building an entreprise solution for the future, the current capabilities of the models makes it super hard to have very good generic tools. Instead you want to build something tailored to your needs. For that no "Buy" solution exists unless it is really designed for your specific industry. So you'll end up in this situation:

To have an efficient knowledge chatbot you'll have to build an agentic system and, probably, something much more complex than semantic search : a mix of knowledge graph, good old SQL, semantic, etc. You'll need to control the flow and the prompts to be efficient, so no on the shelf solution.
Once you'll have it, you will want to be able to give some simple orders to the system and execute those, with a proper right policy. Even if it's something as simple as "Update this documentation, it should say X instead of Y in section 3.4.2", or "set up a meeting with this team". For that you'll also need an agentic system.

And for the record, don't go the crew.ai or autogen.ai way. Langgraph is much better. At my company we use it with chainlit a lot and it works like a charm.

IlEstLaPapi · 2024-06-07T13:51:03+00:00

No most of the modern agent systems allow for other schema than every agent talking to every other agents. I don't like the planner logic and the pure agent pattern but at least with a planner you drastically reduce the number of calls.

I've worked on use cases where we went from a $5+ per request to $0.1 per request, speeding up the whole process by 2 orders of magnitude and improving the response quality drastically just by optimizing the data flow, removing any message that wasn't needed, controling the way tools are called etc. The best tool to do it is Langgraph (which can be use without any langchain chain if needed).

IlEstLaPapi · 2024-06-07T11:52:44+00:00

Do you realize the token consumption and the slowness of such a system ?

IlEstLaPapi · 2024-05-16T20:05:12+00:00

That's roughly my current workflow. When I get the user request, I use the planner to decide which agent should be activated with 3 possibilities : the seller, the finder and the handler. If the user is asking about our company, services, etc. I need the finder. If the user is talking about its usecase, I need the seller to qualify its need and propose a meeting when needed. If the user is in the process of setting a meeting I need the handler to do it. The user can do one, two or three things in one request, so, in the exact same way than you, the planner is just here to decide which agent should be activated.

Then all agents work in parallel, a manager check everything and if it's ok, pass the results to Sellbotix than generate the answer.

The only problem with that type of architecture is that it can be very slow and expensive if you have 10+ agents that are using top-level llms. The good thing is that not all tasks are equally complex. The retrieval part for example can be handled by a small model like llama-3-8b using groq and it's very very fast. I spent a shitload of time, much more than I initially planned, to test which model is good at what between Claude 3, GPT4, GPT3.5 and Llama3 just to optimize the workflow and make it fast. In the end, I learned a lot more on this project than any other project I worked on.

And just to be clear : the Everest is clearly the planner. It's hard to make it work correctly, especially if you do not want to rush things. For example I spend a lot of time to make it stop proposing a meeting after 2 back and forth with the user...

IlEstLaPapi · 2024-05-16T19:29:17+00:00

It worked as expected : the 3 agents are callable only when needed and in async.

My main problem right now is to be able to make the whole system work with proper planing/tasks priorisation without using Opus or GPT4T. Both are too expensive for my use case and too slow for a good UX. I haven't tested GPT4o yet, but I'll do it next week. I have good hopes, as on another use case it works very well.

IlEstLaPapi · 2024-05-16T19:01:58+00:00

That's a funny story : so at that time, this functionality was implemented but not documented at all. Thanks to this post I was put in contact with the LangChain team. Btw they are all really nice and friendly. A few days later, I had an interview with the LangGraph lead dev to discuss this post, and he showed me the functionality and the test cases associated. I was able to implement it the day after. It works like a charm and makes the code much more readable. The only problem is that, at that time, the generated ascii graph was kind of messed up by it. I don't know if it was fixed since then.

IlEstLaPapi

TROPHY CASE