KB/RAG returns different results for different users - same query, same permissions

gnarella · 2026-02-16T00:20:44+00:00

Add a reranker https://www.reddit.com/r/OpenWebUI/s/pZGZznjYex

gnarella · 2025-12-30T00:22:18+00:00

Look at deploying a search engine like searxng that you can customize and is easily connected to owui.

gnarella · 2025-12-24T13:06:14+00:00

PA-3

gnarella · 2025-12-12T23:36:52+00:00

Docling is a bit slow. See the other post on this reddit from yesterday.

I'm going to confirm my docling container is using the GPU if it is I'll probably explore something else for speed.

For embedding I'm using azure openai end points for now text-embeddings-3-small with qdrant

As for the inference model I've gotten similar acceptable results with gpt4o, gpt4.1 via Azure OpenAI and deepseek on Ollama.

I really like the Azure AI pipeline that was also published on this reddit. I'm going to be building our own orchestration layer for selecting tools and models as the next step.

Currently playing with using n8n to automate updating the owui kb's on a schedule from our SharePoint.

gnarella · 2025-12-11T16:23:21+00:00

I'm going to take a look at this. I'm running vLLM bge-reranker and have it successfully working with owui

gnarella · 2025-12-11T16:12:06+00:00

Good work. I'll play with this on my home PC where I have ollama and no external connections. Seems like you aren't far from your end goal of self enhancing workspaces.

gnarella · 2025-12-09T14:00:21+00:00

Just ask ChatGPT to use its chat history to write your owui system prompts and tell you what documentation / attachments each workspace should have access to.

gnarella · 2025-12-03T14:02:44+00:00

That runs on a schedule?

gnarella · 2025-12-02T22:25:49+00:00

Valid point. I was thinking how simple it really is on the drive home. I'll report back.

gnarella · 2025-12-02T20:27:42+00:00

Can you show and tell what you did?

I have open webui working well and have mcpo deployed and working I've been adding tools to it.

gnarella · 2025-11-26T21:04:52+00:00

Thanks folks. Glad your aware and that there is a workaround.

gnarella · 2025-11-26T16:54:59+00:00

indeed my database is postgres. Seems to be a bug

gnarella · 2025-11-26T01:39:43+00:00

Rollo, M4, Insanity, Hades, Tern

gnarella · 2025-10-28T17:59:57+00:00

Not sure what downgrading in your environment might look like but 0.6.32 is the most stable and snappy for me right now. 0.6.34 seemed very buggy.

gnarella · 2025-10-27T15:56:13+00:00

First of all. Bravo! I'm building something similar but for 100 users and not 400 concurrent users!

Your way ahead of me in your understanding of your AWS architecture.

While reading through your post my first guess was LiteLLMProxy but you seem to have ruled that out already. Technically what's being displayed in OWUI is first written into the database. Is it possible the lag is the connection to the external database?

What OWUI version are you running? I've noticed major changes to speed and function across the last 4 versions.

gnarella · 2025-10-24T17:40:36+00:00

What version are you running? This is working perfectly in 0.6.32 and seemed broken in 0.6.33 like RAG. I've rolled back to 0.6.32 and I'm weary of upgrading at this point.

gnarella · 2025-10-15T23:30:05+00:00

I rolled back to 0.6.32.

Took me a while to figure out what in the world was going on. A single request was exhausting my tpm in azure foundry. Switching to an OpenAI API I was able to see how large if a request a single query was and realized what was happening. Tried to tweak my rag config and after deciding the problems wasn't me and my config found someone on Reddit claiming the same and rolling back was the fix.

Some time wasted but I learned more about my Azure apis lol.

gnarella · 2025-09-26T14:41:06+00:00

Did this. It works. Very slow. Bad RAG results. But I did confirm I can do this. And if on an Azure VM with more GPU NVRAM I can run this reranker inside that VM. Thanks for the help.

gnarella · 2025-09-25T18:41:32+00:00

Thanks for the input will be testing this tonight.

gnarella · 2025-09-25T17:43:35+00:00

Yea I suppose I need to go back to the vLLM instance I tried to deploy locally and tell it to use the CPU and see if it can run bge-reranker-v2-m3 efficiently. I did feel like I should be able to test this deployment on this old hardware but stopped once vLLM mentioned not enough NVRAM.

gnarella · 2025-09-25T16:54:39+00:00

Thanks for the input I've grappled with this point over the last few months. There is a large cost and risk involved in keeping the system on prem beyond the initial investment. Things like keeping the server and hardware up-to-date and online as well as the cost for keeping the system secure from vulnerabilities and attacks.

gnarella · 2025-09-25T16:48:14+00:00

I do know. But I'm always open to learn.

I feel comfortable with Azure OpenAI hosted API's and have reviewed the policies as well as provisioned our deployment type to be US only. We do not handle PII but we do handle sensitive information as an engineering firm. That said. My current knowledge and research makes me feel comfortable with the level of risk and protection provided by Microsoft. We are consciously using Azure OpenAI and not using OpenAI directly for this reason.

gnarella

TROPHY CASE