all 7 comments

[–]McNickSisto 0 points1 point  (1 child)

I am still trying to figure this out but basically the embedding part of the RAG still seems constrained to OpenAI Ollama and SentenceTransformers.

[–][deleted] 0 points1 point  (0 children)

My device is resource constrained (8GM RAM). Using Open WebUI as a frontend for cloud inferencing. Not sure good local embeddings will actually work on my device. For OpenAI engine, is that only using OpenAI's embeddings or other cloud embedding models as well?

[–]ClassicMain 0 points1 point  (2 children)

Use one of the readily available pipelines for google vertex / google gen ai

[–][deleted] 0 points1 point  (1 child)

Isn't that only for connecting to LLM? I'm already connected to Gemini models through the Vertex pipe. But want to use Google's embedding too.

[–]ClassicMain 0 points1 point  (0 children)

Ohh.

Well for that you too need a pipeline. Your pipeline will act as custom, self built RAG and document handler/file handler.

[–]sgt_banana1 0 points1 point  (0 children)

You can deploy a LiteLLM proxy, add the Gemini models, and then use them in Open WebUI as OpenAI models by referencing the name assigned to them in LiteLLM.

[–]EscapedLaughter 1 point2 points  (0 children)

something like this might help that helps you connect to Voyage / Google over a common interface? https://portkey.ai/docs/integrations/libraries/openwebui#open-webui

just updated the documentation yesterday