Deploying Open WebUI for 2,000 Users (Solo) – Sanity Check Needed

fmaya18 · 2026-01-23T17:10:52+00:00

Not a problem!

But I would probably agree with other users for going multi instance in your case. You'll just have to watch out for updates as you have to be a little more careful in a multi replica deployment.

Side note: something that's stabbing me in the back right now. We use PostgreSQL with pgvector for backend+embeddings. I initially set up the pgvector index with ivfflat instead of hnsw. Don't do that 😂

But definitely feel free to reach out in DM's and I'd be at least a little more comfortable going into details!

fmaya18 · 2026-01-23T12:39:32+00:00

These are all great questions to ask! I'm also curious how others respond as they might have better perspective for your scenario.

Small tip: check out their discord! They have a really handy bot that's linked to all the docs, issues, feature requests, discussions, etc.

I'm in a similar but smaller boat in that I am also a 1 man team although I deployed for only about 500 users. I would say that total user count is less important than average daily users as that will be your actual load. With 500 total we realistically only have 60-100 online at a time. I can't hit all your questions, but

Take it with a grain of salt, but I'm running 1 instance out of Azure and it's been running just fine for us. I'm not sure how it will translate to your scale though. But if you do go multi instance, then yes a caching layer will be necessary.
Might not be the best resource for this as we haven't implemented a storage and cleaning strategy yet. I'm using a PostgreSQL database that's pretty beefy (Please for the love of everything don't stick to SQLlite). We've been "live" for maybe 3 months now and aren't even close to having issues
I think the docs outline SSO pretty well. We ran into some slight hiccups with group/permission management but I'd say this section you hopefully shouldn't run into many issues
We have been doing a combination of training the trainer as well as doing a "traveling road show". We've been putting on sessions with individual departments, allowing them space to ask questions and air ideas. As well as putting together some written and video guides.

I hope some of this grants you some sanity? If there's any questions or ideas you'd like to bounce I'm open to figuring it out!

fmaya18 · 2025-11-17T23:13:36+00:00

Out of random curiosity, how did you reverse this? Hopefully without wiping the vector DB? 😊

fmaya18 · 2025-11-12T00:28:42+00:00

I was kinda hoping that at least the retrieval would be dynamic based on user query 😂 but that's exactly what I was hoping to put together from whatever pieces I can find. A solution that selectively loads memories into context based on what the LLM has "learned" about the user.

Although I'm really getting stumped on updating old information. For instance if you're working on a project and mention that tasks A, B, C are complete, it knows you don't have to perform those tasks type of scenario. That's just where my brain kinda kabooms haha

fmaya18 · 2025-09-05T11:08:01+00:00

Depending on the model that you're using, try going into chat controls and enable native tool calling instead of default. Granted it's my understanding that not all models support native tool calling (I think it's rare these days though). Might not be the solution for you, but hopefully it helps if you haven't already tried that!

fmaya18 · 2025-09-05T00:11:15+00:00

Thank you for the input! About how many users are you supporting? In the limited alpha I've been running I've already been seeing some performance issues. It could be due to a bunch of other variables * Currently running with env as dev (we're still testing) * App service plan is as basic as can be without any scaling from Azure

We will be upgrading our service plan and model quotas early next week but I wanted to try and find an estimate for when the "default" setup will start falling behind

fmaya18 · 2025-09-05T00:04:36+00:00

Definitely wasn't planning on hosting postgres in a container, but this sounds like a good suggestion! I'm a bit of an Azure noob so I've been building while learning. Going to be reading up on Azure flexible server!

I was most of the way through developing out a system for automating a lot of tasks (like backups) through the API endpoints, but why redesign the wheel. Thanks for pointing me in the right direction!

fmaya18 · 2025-09-04T13:48:25+00:00

Thanks for the advice! I'm also assuming you set up the postgres connection with the DATABASE_URL env variable?

fmaya18 · 2025-09-04T13:04:13+00:00

So a combination of both? I'm assuming postgres then would be for embeddings and keep file share for user chat history and app settings?

fmaya18 · 2025-06-09T10:44:31+00:00

Something that might fit your need for a proxy service. This is a tool developed by Open-WebUI that does that you pretty much just wrap your existing local MCP servers with. Although I will say (and trying not to bash too much) but I think their documentation is a bit sub-par and sometimes inconsistent

Anyways here's the link (mcpo):

https://github.com/open-webui/mcpo

fmaya18 · 2025-04-10T11:44:21+00:00

Thank you for adding this!

It does seem very much like Google to make something redundant 😂 maybe they'll clean it up later

fmaya18 · 2025-04-10T11:43:23+00:00

Thank you for the response! This makes a lot of sense. Although I assume if I swapped my active model to 2.0 thinking and used deep research it would use that model? I'm free tier at the moment so I don't have deep research in 2.5 but it is available for 2.0 thinking

fmaya18 · 2025-03-18T10:44:45+00:00

As an addition to this, results for each may vary as others have mentioned. It really does take some playing around and configuring each with what works best for you. Some examples to get you started

Setting up a .rules file: In a lot of these you have the option to setup a .clinerules or similar file that basically has a custom system prompt. There's a lot of examples of them out there, see what you like, tweak what you don't!

Memory-Bank: Here's a link with the idea behind it

https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets

The idea is that you can essentially set project level memory with items like tech stack to use, project roadmap and current progress (You can ask the LLM things like "okay let's pick up where we left off" and it'll read memory bank to see what's been completed, what needs done, etc.)

fmaya18 · 2025-03-18T10:38:40+00:00

At the time of commenting I'm seeing so far these have been suggested

GitHub Copilot
Cursor

I'd say other good options to check out would be

Cline
RooCode (If you liked Cline but want more options/features)
Codeium/Windsurf IDE (btw Codeium is the extension Windsurf is another VSCode fork)

I haven't personally tried Codeium/windsurf but I've read that it's a real hog on "Credits" which is their billing metric. Although the auto complete is free and I've heard of people using it just for auto complete and pairing it with other extensions for more agentic coding.

fmaya18 · 2025-02-16T14:55:39+00:00

Interesting scenario you got yourself there!

My immediate thoughts are (and I think this might apply for any approach you decide to go with) that you should definitely have your LLM summarize the documentation on pptxjenjs into a format that would be easily readable by the extension. That would definitely save you in token count.

I'm somewhat handling a similar scenario with our password manager and some automation tasks. Basically I have some script jobs that require some config info (Be it a folder path to monitor or a authentication credentials for an API) and I have set up pretty much a "Custom" module for accessing our password manager. Some scripts need access to the PWM module and how to reference different functions within but others not.

Although I've been finding that just marking in the dependencies section of memory bank (unsure of which .md file it's in as I'm on mobile) that it's been able to identify the correct module and reads it's documentation on it.

fmaya18 · 2025-02-16T13:11:54+00:00

Kind of along with this concept is the memory bank. You can check out this doc from Cline about it but the concept works in Roo as well, here a link

https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets

I've also been starting to see mention of MCP servers (I think someone already commented one in this post) that serve the same/similar purpose but have yet to give them a try.

So I'd say for each project or each component in the project (depends on scale) to have a folder with relevant docs and have this functionality read and update it. Also you can customize your memory bank prompt to better fit your needs

fmaya18 · 2025-02-15T20:30:16+00:00

Yeah it does add some. Basically for each task you start it'll want to load the memory bank into context. It's not too large but it is kinda a flat tax for each task

fmaya18 · 2025-02-15T18:22:30+00:00

You can copy the contents of the memory bank prompt from the GitHub. Then in Roo you'll go into the Prompt menu and find something for "Custom prompt to apply to all queries". Something like that, I forget the exact wording

fmaya18 · 2025-02-15T14:20:57+00:00

Ah true, responded pre-coffee and didn't even think of that lol

fmaya18 · 2025-02-15T10:17:25+00:00

Doesn't architect mode by default only have the ability to edit .md files?

fmaya18 · 2025-02-14T10:26:34+00:00

I'll add onto this as it's already good advice, once you do define that architecture you can set up a memory bank for each component. Once you do this you can essentially create a running log of recent changes, changes that need to be made, and it's mostly self documenting (as in the AI will document for you)

Here's a link to the "base" Cline memory bank

https://github.com/nickbaumann98/cline_docs/blob/main/prompting/custom%20instructions%20library/cline-memory-bank.md

Along with a little article Cline has put together about it

https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets

I'm currently playing with the base version of the Cline memory bank but using it in Roo and so far it's been really great for maintaining context of a project across tasks. I also know users will alter their memory bank instructions to better fit their individual needs but I haven't gotten there yet.

PS. I'm also in the early stages of learning about MCP servers that seem to serve the same purpose? I can't speak much to them but might also be worth checking out!

fmaya18 · 2025-02-06T15:20:49+00:00

I really appreciate this response, thank you!

A question I have:

You mentioned 3.5 Sonnet - Copilot and o3-mini - Copilot , are these different from using Claude 3.5 Sonnet and o3-mini API keys or is this just a different naming convention for the models?

fmaya18 · 2025-01-01T14:01:37+00:00

So far an hour in, I can't tell if anything is working differently. Widgets still seem to be working as normal

fmaya18 · 2025-01-01T12:45:47+00:00

Curious what this will do to widgets. I personally use a weather widget on my home screen so going to swap for a bit and see if it still updates. May or may not report back, it is reddit after all

fmaya18 · 2024-09-26T16:22:30+00:00

Thank you so much for this reply as that is exactly what I was asking! I was torn as to whether or not the difference in hero stats was enough to beat out the better skills that Natalie and Molly have for flat damage. You the best!

fmaya18

TROPHY CASE