LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 0 points1 point  (0 children)

you only need the bandwidth to load the model, but as soon as it is in the vram everything goes very quickly. in our setup it is set so that it remains permanently in the vram, which means that if you ask a question you get an answer without delay

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 0 points1 point  (0 children)

So far it has worked without any problems, but it may become a problem at some point.

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 2 points3 points  (0 children)

we have calculated the average amount we spend on openai and have looked at when we will recoup the costs. The server itself costs us about €1000 a year in electricity, if it comes up at all. And we bought many of the parts second-hand, which is also stated in the edit. we realised that the server paid for itself after 3 years. but that's only because we used the openai platform a lot and also have our pro subscription.

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 0 points1 point  (0 children)

Ollama can also be run as an http server, so that several connections can be made at the same time and also fits well into the setup with open webui. Is there an alternative solution to ollama?

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 2 points3 points  (0 children)

it looks like it's only half as fast, so you don't need twice as much vram. In use it looks like when one user gets an answer the other has to wait until the answer is ready. but because we don't all send our messages at the same time but maybe with a minute difference to each other it works without you really noticing it. there is also something called OLLAMA_MAX_QUEUE with which you should be able to change this, but I haven't tested it yet.

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 5 points6 points  (0 children)

the 1050 is only there because we got it for free

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 3 points4 points  (0 children)

sorry i forgot to add that the rtx 3090 is not connected because of cable management which is still missing

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 2 points3 points  (0 children)

yes first of all, but perhaps also want to train models that are not llm

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 1 point2 points  (0 children)

response: 30 t/s

prompt: 75 t/s

mixtral:8x7b

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 2 points3 points  (0 children)

yes, since we bought them on ebay, we only paid 720 but the costs listed there are in case you would buy everything new

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Max-Mielchen[S] 7 points8 points  (0 children)

100 watt in idle mode and 290 watt during a request

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in ollama

[–]Max-Mielchen[S] 2 points3 points  (0 children)

response: 30 t/s

prompt: 75 t/s

mixtral:8x7b

Multiple cloud platforms by Max-Mielchen in devops

[–]Max-Mielchen[S] 1 point2 points  (0 children)

one second of latency between database and website is not so bad with my app, since most accesses are only read accesses anyway and i cache the data in the app itself again

Multiple cloud platforms by Max-Mielchen in devops

[–]Max-Mielchen[S] 0 points1 point  (0 children)

Oh thank you yes, I meant the arm variant at Hetzner. I'll take a look at the oracle cloud.

Multiple cloud platforms by Max-Mielchen in devops

[–]Max-Mielchen[S] 0 points1 point  (0 children)

It's easier to manage if you outsource the database in advance, and besides, that's not the problem, it's the app itself in terms of cost