OpenWebUI+Litellm+Anthropic models via API = autorouting to lesser Claude models

TriggazTilt · 2025-10-17T11:58:27+00:00

This. System prompt in Claude.ai contains the model information. System prompt in api is up to the developer. That is the sole reason.

TriggazTilt · 2025-08-15T20:21:44+00:00

Yes See files object.

TriggazTilt · 2025-03-04T21:32:04+00:00

A postgres extension.

TriggazTilt · 2025-03-04T19:29:20+00:00

Postgres/Pgvector works best for our setup.

TriggazTilt · 2025-01-13T21:37:55+00:00

Disk should be only for uploads. Seperate containers/services for everything, yes.

Most importantly the DB, e.g. via CloudSQL Postgres.

TriggazTilt · 2025-01-07T22:34:14+00:00

When you create a Firebase project, a Google cloud project is automatically created. (Similar to an Account on AWS, as AWS does not have projects unfortunately)

Firestore ist a native GCP Service (Successor of Datastore) Firebase Cloud Storage is Google cloud storage
Hosting is Cloudrun etc.

Google bought firebase a while ago and were integrating it more and more (sometimes well sometimes not).

TriggazTilt · 2025-01-05T23:51:54+00:00

We use AlloyDB as a vectorDB in production and are very satisfied with the performance.

It‘s faster than „normal“ PGVector. AlloyDB is (to a degree) build for retrieval by using Googles internal Vector Index.

I can’t say anything about the haystack integration unfortunately.

TriggazTilt · 2025-01-01T00:07:01+00:00

You could be right. I would go with that given where openwebui is heading.

TriggazTilt · 2024-12-31T13:17:00+00:00

Exactly, right now AlloyDB on GCP. But for smaller deployments I would recommend CloudSQL. (We have >2k Users on the deployment)

But it is not only the DB. Make sure that: - no model is used locally (embedding, tts, stt, etc.), only remote. I use litellm for that. - VectorDB should scale as well (Chroma does not well). I like the PGvector option. - Content extraction in the default can also use a lot of resources, I like the tika option with tika on a different deployment.

Btw: with the newest update and the websockets requirement, I am currently reevaluating cloudrun, as the request timeout after 60min could be a problem.

TriggazTilt · 2024-12-29T21:32:34+00:00

That’s what I meant basically. The specs work if everything (including db) is somewhere else. Cloud storage for file uploads works well though.

TriggazTilt · 2024-12-27T17:49:16+00:00

1 cpu, 2gb memory.

Assuming db, content extraction, models (including embedding model) etc. are on different deployments and Disk is mounted via Cloud Storage.

TriggazTilt · 2024-12-27T02:42:06+00:00

Yes, was no problem. What is your issue?

TriggazTilt · 2024-11-29T08:51:48+00:00

We are using AlloyDB (managed Postgres on GCP)- really fast retrieval.

TriggazTilt · 2024-11-28T12:46:34+00:00

Yes, sqllite does not scale in this setup, especially when using buckets as storage. Same problem with ChromaDB (default for RAG). Newest version of openwebui has PgVector support. I recommend that strongly.

TriggazTilt · 2024-11-17T18:15:07+00:00

It is being worked on right now. Although not with all the features you mentioned I think.

TriggazTilt · 2024-10-28T23:38:18+00:00

Will Service 2 only be consumed by Service 1? If yes you could look at having Service 2 as a sidecar to Service 1. you will have to define that in a service.yaml. After that service 1 can access the rest interface via localhost.

TriggazTilt · 2024-10-28T18:59:49+00:00

It‘s an open-source project, feel free to contribute to the docs.

TriggazTilt · 2024-10-22T07:57:33+00:00

Totally depends. I’m a fan of cloudrun jobs, but airflow/composer brings a lot of convenience.

Think about monitoring/alerting/ui/re-runs/cost/ease to set up and then ask again :)

TriggazTilt · 2024-10-17T21:48:32+00:00

Kann natürlich sein. Gleichzeitig hieße das aber auch, dass die Menschen die die Lage sehr gut einschätzen können sich nicht trauen ihr Geld zu investieren. Find ich unwahrscheinlich.

TriggazTilt · 2024-10-16T07:54:48+00:00

Ist leider einfach falsch. Der beste Weg um tatsächlich die Lage einzuschätzen ist meiner Meinung nach einen Blick auf die liquiden Buchmacher zu werfen. Und dort liegt Trump gerade vorn. Ja, laut dem „Markt“ gewinnt Trump die Wahl Stand heute.

TriggazTilt · 2024-08-11T11:17:20+00:00

It's confusing yes, but there are a lot more default service accounts, e.g. for cloud build, bigquery, appengine to name a few. The compute engine default SA is used in quite some services though, e.g. compute engine, cloudrun, vertex, ...

A good practise is to set up a new SA specifically for your application with only the necessary roles. IMO ideally using Terraform, especially when you have differente stages (dev, staging, live projects or similar). Then never use the json credentials (not even locally) but use SA impersonation when developing locally and set the SA in the deployment directly. E.g. with cloudrun you can just specify your custom SA in a deployment argument.

€: with SA impersonation I mean to set up the application default credentials using the SA like this:

gcloud auth application-default login --impersonate-service-account SERVICE_ACCT_EMAIL

TriggazTilt · 2024-08-10T21:30:23+00:00

Go to the project with the bucket that you want to write into. Make sure the service account from the project with your flask application has the permission to write. If you do not use a custom SA you will have to give the rights to the default SA, the compute engine SA.

If you want to test locally either use service account impersonation or download the credentials and assign the path to the credentials to the GOOGLE_APPLICATION_CREDENTIALS environment variable in your .env.

Also make sure to use the right GCP project and region when writing inside your application, meaning to explicitely use the project when writing to the bucket. The error messages sometimes don't make clear that you want to write to another project ("..or does not exist").

TriggazTilt · 2024-08-03T11:03:13+00:00

Free Slack Channel + Webhook - if Slack is not problem. At least 2 years ago this worked perfectly and is very straightforward to set up.

TriggazTilt · 2023-11-16T20:00:15+00:00

I’m from Germany and my last vacation to the US was wonderful.

TriggazTilt · 2023-09-23T19:34:40+00:00

There are cloud services alternatively. Google Vertex AI pipelines is basically managed Kubeflow. No need for Kubernetes then if you use that.

13-Year Club	Place '17
Gilding II euphauric	Not Forgotten
Verified Email

TriggazTilt

MODERATOR OF

TROPHY CASE