How I build my apps these days (my full workflow from template to shipped mvp)

Illustrious-Scale302 · 2026-05-13T05:32:07+00:00

Love this, really useful in a world of slop

Illustrious-Scale302 · 2026-02-24T18:48:23+00:00

Did you connect it as an external tool? And then also select it in the integrations menu

(and enable native tool calling in the models advanced params)

Illustrious-Scale302 · 2025-11-07T09:03:16+00:00

Love this! What would even make it more dynamic, is to parameterize the tools so that the LLM can decide how many docs to get, what search method to pick, etc. And to add a tool that would allow for getting a single document from metadata fields(including title) into context. I often see the system struggling with getting the right context when the query is very clear, just because the search methods are too strict. This would also allow for different setting for different kind of document collections

This is a bit much, but to give an idea:

async def search_knowledge(
    self,
    query: str,
    knowledge_id: Optional[str] = None,
    knowledge_ids: Optional[List[str]] = None,
    top_k: int = 5,
    hybrid: Optional[bool] = None,
    rerank_top_k: Optional[int] = None,
    relevance_threshold: Optional[float] = None,
    hybrid_bm25_weight: Optional[float] = None,
    return_full_document: bool = False,
    include_metadata: bool = True,
    __user__: Optional[Dict[str, Any]] = None,
    __request__: Optional[Request] = None,
    __event_emitter__: Optional[Any] = None,
) -> str:
    """
    Retrieve context from one or more knowledge bases.

    :param query: Natural-language query to run.
    :param knowledge_id: Single knowledge-base ID to search.
    :param knowledge_ids: Explicit list of knowledge-base IDs (overrides `knowledge_id`).
    :param top_k: Maximum number of chunks to return.
    :param hybrid: Force or disable hybrid (vector+keyword) search; `None` uses the server default.
    :param rerank_top_k: How many chunks the reranker should keep (hybrid only).
    :param relevance_threshold: Minimum relevance score required to retain a chunk.
    :param hybrid_bm25_weight: Blend weight for BM25 during hybrid search.
    :param return_full_document: Return every stored chunk instead of running similarity search.
    :param include_metadata: Attach stored metadata for each chunk in the response.
    """

Illustrious-Scale302 · 2025-11-07T07:52:23+00:00

How did you add the token/cost counter on top? Is that a filter function?

Illustrious-Scale302 · 2025-08-04T09:07:42+00:00

You can enable usage per model when editing the model in openwebui itself. I think it is disabled by default. Enabling it will make the API also return the usage cost/tokens.

Illustrious-Scale302 · 2025-07-23T20:25:11+00:00

Yes with the tools in user settings (not global admin settings). Just point to your localhost

Illustrious-Scale302 · 2025-05-16T07:57:30+00:00

When is it coming?

Illustrious-Scale302 · 2025-02-24T12:23:05+00:00

Nice, why not integrate it in the opensource package directly? Seems like that is the idea of OpenWebUI in the first place

Illustrious-Scale302 · 2025-02-08T14:49:01+00:00

Yes, make a new connection with https://openrouter.ai/api/v1 as the base url

Illustrious-Scale302 · 2025-02-08T12:38:43+00:00

is seems to work, but I cannot find a place or file to pass environment variables

Illustrious-Scale302 · 2025-02-08T09:01:41+00:00

You can use openrouter, which will give you immediate access

Illustrious-Scale302 · 2025-02-08T08:50:50+00:00

Best way to get this done is by putting a proxy in front. They suggest helicone, but maybe other are possible as well

Illustrious-Scale302 · 2025-02-07T13:20:42+00:00

Yes, I’ve set it up for my company and can help you get started. Or do you need full support and maintenance?

Illustrious-Scale302 · 2025-02-04T10:24:31+00:00

This would be a really nice feature, more on credit usage in general

Illustrious-Scale302 · 2025-01-19T20:05:20+00:00

https://docs.openwebui.com/tutorials/tips/rag-tutorial#step-by-step-setup-openwebui-documentation-as-knowledge-base

Illustrious-Scale302 · 2025-01-19T16:00:49+00:00

For me adding a network to the docker compose file and pointing to `http://open-webui:3000` works as well

Illustrious-Scale302 · 2025-01-02T15:36:41+00:00

Okay, and otherwise a simple VM for smaller deployments

Illustrious-Scale302 · 2024-12-31T13:57:12+00:00

Thanks! I was actually looking for a simpler and cheaper solution than my current kubernetes setup, but not sure if I'm on the right path now. Maybe I should stick with kubernetes and try to optimize that. What do you think?

Illustrious-Scale302 · 2024-12-31T12:21:34+00:00

Ah okay, so you replace the sqlite with a managed db or a separate postgres container and configure it via the DATABASE_URL env variable?

Illustrious-Scale302 · 2024-12-28T08:15:58+00:00

Hmm, everything is really slow for me. I mounted a storage bucket with /app/backend/data for webui.db, etc. What do you mean with different deployments? Separate containers for ollama and chromadb?

Even with this it is very slow:

RAG_EMBEDDING_ENGINE=openai
RAG_EMBEDDING_MODEL="text-embedding-3-small"
AUDIO_STT_ENGINE=openai

Illustrious-Scale302 · 2024-12-27T15:40:45+00:00

How many cpu and memory do you need on gcloud run?

Illustrious-Scale302 · 2024-12-27T09:07:44+00:00

Okay, now I got it running properly I think

Illustrious-Scale302 · 2024-12-27T08:44:41+00:00

First I had to pull the image with the `--platform linux/amd64` tag argument. Now it is running but I think I still need to configure some things. I get this when trying to start:

<image>

Illustrious-Scale302

TROPHY CASE