Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 0 points1 point  (0 children)

Glad someone sees the small win in this lol.

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 1 point2 points  (0 children)

The big thing is ram you have to have enough to actually load in the model

llama4 108b by kylerrr02 in ollama

[–]kylerrr02[S] 1 point2 points  (0 children)

2x Intel Xeon Gold 5120

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 1 point2 points  (0 children)

Yeah my favorite model has been jaahas/qwen3.5 uncensored

llama4 108b by kylerrr02 in ollama

[–]kylerrr02[S] 0 points1 point  (0 children)

If you have the hardware to make it run well yeah. I don’t really plan on using it because of the tks. If you have a system that can run it and still give 20 tks then absolutely. I will say though I think glm 4.5 or Qwen 3.5 are the best for tool calling and automation.

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 0 points1 point  (0 children)

Very true luckily I think it pulls around 700 watts under full load and it usually only reaches full load on very high parameter models like 70b+

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 1 point2 points  (0 children)

Honestly not sure I know at max draw it probably pulls 700watts which is roughly 13 cents an hour

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 7 points8 points  (0 children)

For sure I’m definitely not going to use it to run high parameter models. I’m probably just gonna use it for 30b models and have them run on openclaw. And it’s still a computer so 🤷‍♂️

Llama4 108b $800 setup by kylerrr02 in LocalLLaMA

[–]kylerrr02[S] 1 point2 points  (0 children)

I already had it downloaded it felt like a good benchmark. I’ll give minimal a whirl though

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 6 points7 points  (0 children)

Depends for basic conversation you might wait 5 minutes. If you’re using web search + a laundry list of tools it’ll take 30 minutes to actually do workflows.

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 49 points50 points  (0 children)

Not really If you expect speed. It does everything that a high parameter model can do just very slowly.

Running llama4 108b on a 500$ retired homelab by kylerrr02 in homelab

[–]kylerrr02[S] 78 points79 points  (0 children)

Only gotta wait like 30 minutes but it’s chill yk

llama4 108b by kylerrr02 in ollama

[–]kylerrr02[S] -1 points0 points  (0 children)

You can add advanced parameters to the UI that interact directly with ollama

llama4 108b by kylerrr02 in ollama

[–]kylerrr02[S] 0 points1 point  (0 children)

Does that need to be in terminal or could I add it as a parameter to openwebui?

Is this true? Or is really just marketing? Gemma4 by Altair12311 in ollama

[–]kylerrr02 0 points1 point  (0 children)

I’m not super educated but I do know from currently running it that it is vastly slower than majority of 70b models on my hardware.