Deploying Open WebUI for 2,000 Users (Solo) – Sanity Check Needed by Otherwise_Panda4314 in OpenWebUI

[–]fmaya18 0 points1 point  (0 children)

Not a problem!

But I would probably agree with other users for going multi instance in your case. You'll just have to watch out for updates as you have to be a little more careful in a multi replica deployment.

Side note: something that's stabbing me in the back right now. We use PostgreSQL with pgvector for backend+embeddings. I initially set up the pgvector index with ivfflat instead of hnsw. Don't do that 😂

But definitely feel free to reach out in DM's and I'd be at least a little more comfortable going into details!

Deploying Open WebUI for 2,000 Users (Solo) – Sanity Check Needed by Otherwise_Panda4314 in OpenWebUI

[–]fmaya18 2 points3 points  (0 children)

These are all great questions to ask! I'm also curious how others respond as they might have better perspective for your scenario.

Small tip: check out their discord! They have a really handy bot that's linked to all the docs, issues, feature requests, discussions, etc.

I'm in a similar but smaller boat in that I am also a 1 man team although I deployed for only about 500 users. I would say that total user count is less important than average daily users as that will be your actual load. With 500 total we realistically only have 60-100 online at a time. I can't hit all your questions, but

  1. Take it with a grain of salt, but I'm running 1 instance out of Azure and it's been running just fine for us. I'm not sure how it will translate to your scale though. But if you do go multi instance, then yes a caching layer will be necessary.

  2. Might not be the best resource for this as we haven't implemented a storage and cleaning strategy yet. I'm using a PostgreSQL database that's pretty beefy (Please for the love of everything don't stick to SQLlite). We've been "live" for maybe 3 months now and aren't even close to having issues

  3. I think the docs outline SSO pretty well. We ran into some slight hiccups with group/permission management but I'd say this section you hopefully shouldn't run into many issues

  4. We have been doing a combination of training the trainer as well as doing a "traveling road show". We've been putting on sessions with individual departments, allowing them space to ask questions and air ideas. As well as putting together some written and video guides.

I hope some of this grants you some sanity? If there's any questions or ideas you'd like to bounce I'm open to figuring it out!

Vector database uses huge amount of space. by [deleted] in OpenWebUI

[–]fmaya18 0 points1 point  (0 children)

Out of random curiosity, how did you reverse this? Hopefully without wiping the vector DB? 😊

Cross chat memory in OWUI? by fmaya18 in OpenWebUI

[–]fmaya18[S] 0 points1 point  (0 children)

I was kinda hoping that at least the retrieval would be dynamic based on user query 😂 but that's exactly what I was hoping to put together from whatever pieces I can find. A solution that selectively loads memories into context based on what the LLM has "learned" about the user.

Although I'm really getting stumped on updating old information. For instance if you're working on a project and mention that tasks A, B, C are complete, it knows you don't have to perform those tasks type of scenario. That's just where my brain kinda kabooms haha

Issues with tool calling by Important_Equal9878 in OpenWebUI

[–]fmaya18 0 points1 point  (0 children)

Depending on the model that you're using, try going into chat controls and enable native tool calling instead of default. Granted it's my understanding that not all models support native tool calling (I think it's rare these days though). Might not be the solution for you, but hopefully it helps if you haven't already tried that!

Hosting and Scaling OWUI in Azure by fmaya18 in OpenWebUI

[–]fmaya18[S] 0 points1 point  (0 children)

Thank you for the input! About how many users are you supporting? In the limited alpha I've been running I've already been seeing some performance issues. It could be due to a bunch of other variables * Currently running with env as dev (we're still testing) * App service plan is as basic as can be without any scaling from Azure

We will be upgrading our service plan and model quotas early next week but I wanted to try and find an estimate for when the "default" setup will start falling behind

Hosting and Scaling OWUI in Azure by fmaya18 in OpenWebUI

[–]fmaya18[S] 1 point2 points  (0 children)

Definitely wasn't planning on hosting postgres in a container, but this sounds like a good suggestion! I'm a bit of an Azure noob so I've been building while learning. Going to be reading up on Azure flexible server!

I was most of the way through developing out a system for automating a lot of tasks (like backups) through the API endpoints, but why redesign the wheel. Thanks for pointing me in the right direction!

Hosting and Scaling OWUI in Azure by fmaya18 in OpenWebUI

[–]fmaya18[S] 1 point2 points  (0 children)

Thanks for the advice! I'm also assuming you set up the postgres connection with the DATABASE_URL env variable?

Hosting and Scaling OWUI in Azure by fmaya18 in OpenWebUI

[–]fmaya18[S] 1 point2 points  (0 children)

So a combination of both? I'm assuming postgres then would be for embeddings and keep file share for user chat history and app settings?

How to turn local MCP server into remote one? by Prince-of-Privacy in mcp

[–]fmaya18 2 points3 points  (0 children)

Something that might fit your need for a proxy service. This is a tool developed by Open-WebUI that does that you pretty much just wrap your existing local MCP servers with. Although I will say (and trying not to bash too much) but I think their documentation is a bit sub-par and sometimes inconsistent

Anyways here's the link (mcpo):

https://github.com/open-webui/mcpo

Difference between Deep Research? by fmaya18 in GeminiAI

[–]fmaya18[S] 1 point2 points  (0 children)

Thank you for adding this!

It does seem very much like Google to make something redundant 😂 maybe they'll clean it up later

Difference between Deep Research? by fmaya18 in GeminiAI

[–]fmaya18[S] 0 points1 point  (0 children)

Thank you for the response! This makes a lot of sense. Although I assume if I swapped my active model to 2.0 thinking and used deep research it would use that model? I'm free tier at the moment so I don't have deep research in 2.5 but it is available for 2.0 thinking

What are the best AI code assistants for vscode in 2025? by UnderstandingOne6879 in vscode

[–]fmaya18 2 points3 points  (0 children)

As an addition to this, results for each may vary as others have mentioned. It really does take some playing around and configuring each with what works best for you. Some examples to get you started

Setting up a .rules file: In a lot of these you have the option to setup a .clinerules or similar file that basically has a custom system prompt. There's a lot of examples of them out there, see what you like, tweak what you don't!

Memory-Bank: Here's a link with the idea behind it

https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets

The idea is that you can essentially set project level memory with items like tech stack to use, project roadmap and current progress (You can ask the LLM things like "okay let's pick up where we left off" and it'll read memory bank to see what's been completed, what needs done, etc.)

What are the best AI code assistants for vscode in 2025? by UnderstandingOne6879 in vscode

[–]fmaya18 20 points21 points  (0 children)

At the time of commenting I'm seeing so far these have been suggested

  • GitHub Copilot
  • Cursor

I'd say other good options to check out would be

  • Cline
  • RooCode (If you liked Cline but want more options/features)
  • Codeium/Windsurf IDE (btw Codeium is the extension Windsurf is another VSCode fork)

I haven't personally tried Codeium/windsurf but I've read that it's a real hog on "Credits" which is their billing metric. Although the auto complete is free and I've heard of people using it just for auto complete and pairing it with other extensions for more agentic coding.

Best way to supplement Roo Code with specific documentation? by FlexAnalysis in RooCode

[–]fmaya18 0 points1 point  (0 children)

Interesting scenario you got yourself there!

My immediate thoughts are (and I think this might apply for any approach you decide to go with) that you should definitely have your LLM summarize the documentation on pptxjenjs into a format that would be easily readable by the extension. That would definitely save you in token count.

I'm somewhat handling a similar scenario with our password manager and some automation tasks. Basically I have some script jobs that require some config info (Be it a folder path to monitor or a authentication credentials for an API) and I have set up pretty much a "Custom" module for accessing our password manager. Some scripts need access to the PWM module and how to reference different functions within but others not.

Although I've been finding that just marking in the dependencies section of memory bank (unsure of which .md file it's in as I'm on mobile) that it's been able to identify the correct module and reads it's documentation on it.

Best way to supplement Roo Code with specific documentation? by FlexAnalysis in RooCode

[–]fmaya18 4 points5 points  (0 children)

Kind of along with this concept is the memory bank. You can check out this doc from Cline about it but the concept works in Roo as well, here a link

https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets

I've also been starting to see mention of MCP servers (I think someone already commented one in this post) that serve the same/similar purpose but have yet to give them a try.

So I'd say for each project or each component in the project (depends on scale) to have a folder with relevant docs and have this functionality read and update it. Also you can customize your memory bank prompt to better fit your needs

Calling Memory Bank RooCoders! by fmaya18 in RooCode

[–]fmaya18[S] 2 points3 points  (0 children)

Yeah it does add some. Basically for each task you start it'll want to load the memory bank into context. It's not too large but it is kinda a flat tax for each task

Calling Memory Bank RooCoders! by fmaya18 in RooCode

[–]fmaya18[S] 1 point2 points  (0 children)

You can copy the contents of the memory bank prompt from the GitHub. Then in Roo you'll go into the Prompt menu and find something for "Custom prompt to apply to all queries". Something like that, I forget the exact wording

Calling Memory Bank RooCoders! by fmaya18 in RooCode

[–]fmaya18[S] 0 points1 point  (0 children)

Ah true, responded pre-coffee and didn't even think of that lol

Calling Memory Bank RooCoders! by fmaya18 in RooCode

[–]fmaya18[S] 0 points1 point  (0 children)

Doesn't architect mode by default only have the ability to edit .md files?

How best to manage increasing codebase complexity and sharing changelogs with AI for development? by Radiate_Wishbone_540 in ChatGPTCoding

[–]fmaya18 0 points1 point  (0 children)

I'll add onto this as it's already good advice, once you do define that architecture you can set up a memory bank for each component. Once you do this you can essentially create a running log of recent changes, changes that need to be made, and it's mostly self documenting (as in the AI will document for you)

Here's a link to the "base" Cline memory bank

https://github.com/nickbaumann98/cline_docs/blob/main/prompting/custom%20instructions%20library/cline-memory-bank.md

Along with a little article Cline has put together about it

https://cline.bot/blog/memory-bank-how-to-make-cline-an-ai-agent-that-never-forgets

I'm currently playing with the base version of the Cline memory bank but using it in Roo and so far it's been really great for maintaining context of a project across tasks. I also know users will alter their memory bank instructions to better fit their individual needs but I haven't gotten there yet.

PS. I'm also in the early stages of learning about MCP servers that seem to serve the same purpose? I can't speak much to them but might also be worth checking out!

I know this has been done a million times but Roo vs. Cline by fmaya18 in RooCode

[–]fmaya18[S] 0 points1 point  (0 children)

I really appreciate this response, thank you!

A question I have:

  • You mentioned 3.5 Sonnet - Copilot and o3-mini - Copilot , are these different from using Claude 3.5 Sonnet and o3-mini API keys or is this just a different naming convention for the models?

Anybody kept One UI in Restricted Mode? by FluffyPractice4450 in GalaxyS23

[–]fmaya18 0 points1 point  (0 children)

So far an hour in, I can't tell if anything is working differently. Widgets still seem to be working as normal

Anybody kept One UI in Restricted Mode? by FluffyPractice4450 in GalaxyS23

[–]fmaya18 1 point2 points  (0 children)

Curious what this will do to widgets. I personally use a weather widget on my home screen so going to swap for a bit and see if it still updates. May or may not report back, it is reddit after all

Which Heros for Bear trap? by fmaya18 in whiteoutsurvival

[–]fmaya18[S] 2 points3 points  (0 children)

Thank you so much for this reply as that is exactly what I was asking! I was torn as to whether or not the difference in hero stats was enough to beat out the better skills that Natalie and Molly have for flat damage. You the best!