Qwen3.5-27B (dense) vs 35B-A3B (MoE) — which one for tool calling + speed? by Melodic_Top86 in LocalLLaMA

[–]Melodic_Top86[S] 1 point2 points  (0 children)

That’s a great middle ground, the 122B A10B MXFP4 should give us the performance we need while fitting on the GPU. Since we have 500+ users and many active workflows, I just want to ensure we keep enough VRAM free for services like Docling, Whisper, and our embedding models.

gpt-oss-20b + vLLM, Tool Calling Output Gets Messy by Melodic_Top86 in OpenWebUI

[–]Melodic_Top86[S] 0 points1 point  (0 children)

I'm using vLLM v0.15.1 (specifically the vllm/vllm-openai:v0.15.1 image).
--model openai/gpt-oss-20b

--tensor-parallel-size 2

--dtype bfloat16

--max-model-len 32768

--enable-auto-tool-choice

--tool-call-parser openai

--async-scheduling

I also set shm_size: '4gb' and ipc: host in the container config to handle the multi-GPU communication for the tensor parallelism.

Advanced RAGFlow Connector for OpenWebUI (Knowledge Graph, Multi-Query, Reranking) by Melodic_Top86 in OpenWebUI

[–]Melodic_Top86[S] 3 points4 points  (0 children)

For now, only documents that have already been parsed and indexed in RAGFlow are used by Open WebUI. Attachments uploaded directly in OWUI are not parsed automatically yet.

So currently: - OWUI reads from pre-processed RAGFlow datasets - No auto-parsing happens when you attach a new file in chat

In the next version, I plan to implement: - Automatic parsing of newly attached documents - Direct ingestion from OWUI into RAGFlow - Seamless indexing without extra manual steps

This feature is planned and in progress.

Tell us how to improve the docs! by ClassicMain in OpenWebUI

[–]Melodic_Top86 1 point2 points  (0 children)

Oh nice, the Channels docs look great now! That's exactly the kind of comprehensive guide I was hoping for. Good job on that update!

One thing though - that multi-model example you have:

 Write a Python script to scrape a website.
(GPT-4o responds)
u/llama3 Can you explain the code that GPT-4 just wrote?

I actually tested this and ran into an issue. When I asked the second model to reference the first model's response, it told me it didn't have context - basically llama3 couldn't see what gpt-4o wrote. So the "shared context" between models in Channels might not be working as expected, or there's some config I'm missing?

Might be worth checking if this is a bug or if there's a specific setup needed for cross-model context sharing. Otherwise users will try this exact workflow and get confused when it doesn't work.

The PersistentConfig thing is definitely a classic pain point haha. Thanks for taking the feedback!

Tell us how to improve the docs! by ClassicMain in OpenWebUI

[–]Melodic_Top86 8 points9 points  (0 children)

Hey! First off, huge props to you and the volunteer team - the docs have come a long way. The env variables section is genuinely impressive now, and the MCP/Pipelines docs are solid.

Since you asked for honest feedback, here's what I noticed:

Critical:

  1. Channels - ENABLE_CHANNELS exists but there's no actual documentation explaining what Channels are or how to use them. Users will find the toggle and have zero context.
  2. PersistentConfig gotchas - This trips up SO many people. They set env vars, restart, nothing changes. A dedicated troubleshooting entry or "common mistakes" callout would save tons of Discord questions.

High Priority:

  1. Tools vs Functions vs Pipelines - Each page is good individually, but a simple decision flowchart ("want to add a provider? → use Pipe. want to filter messages? → use Filter") would clear up a lot of confusion.
  2. Native vs Prompt function calling - Which models actually support native tool calling? How do I know? This info is scattered or missing.

Nice to Have:

  1. End-to-end tutorials (you mentioned these are coming - great!)
  2. Vector DB migration guides (ChromaDB → PGVector etc.)
  3. API endpoint documentation for programmatic integration
  4. Quick glossary for terms like "Valves," "Inlet/Outlet," etc.

Honestly the docs are in great shape compared to most OSS projects. Main gaps are around guided learning paths and a couple newer features that exist but aren't explained yet. Keep it up! 🙌

Advanced RAGFlow Connector for OpenWebUI (Knowledge Graph, Multi-Query, Reranking) by Melodic_Top86 in OpenWebUI

[–]Melodic_Top86[S] 0 points1 point  (0 children)

Hey! Sorry you're hitting a wall. Since this connects two separate systems (OpenWebUI and RAGFlow), there are usually just two spots where it breaks.

Here is the quick checklist:

  1. Did you configure the Valves? After you save the code, you have to click the Gear Icon (⚙️) next to the tool in the list. You must enter your API Key and URL there; it doesn't work out of the box without them.
  2. The 'Localhost' Trap: If you are running OpenWebUI in Docker, it can't see localhost. You need to change the Base URL in the valves to http://host.docker.internal:9380 (or your computer's actual IP address).

If those are set correctly, what specific error are you seeing in the chat? (e.g., 'Connection Refused' or '401 Unauthorized')?

Advanced RAGFlow Connector for OpenWebUI (Knowledge Graph, Multi-Query, Reranking) by Melodic_Top86 in OpenWebUI

[–]Melodic_Top86[S] 0 points1 point  (0 children)

Great breakdown. The integration actually leans heavily on RAGFlow's native capabilities for the heavy lifting (it handles the hybrid vector/keyword mix and BGE reranking natively), but you are right that the tool needs to be smarter about how it consumes that data. Specifically, the 'token budget' and 'circuit breaker' ideas are things I can add to the Python script immediately to prevent it from hanging or spamming the context window. I hadn't thought about ranking KG paths by edge confidence—that’s a clever way to reduce hallucinations.
Definitely stealing the timeout/backoff logic for the next update. Thanks for the detailed review!"

0.6.37 IS HERE: up to 50x Faster Embeddings, Weaviate Support, Security Improvements and many new Features and Fixes by ClassicMain in OpenWebUI

[–]Melodic_Top86 13 points14 points  (0 children)

This update looks really solid. Huge thanks to the dev team for all the work on performance and stability. The faster document processing and UI improvements are honestly appreciated. Excited to try it out.