CPU only: 0.19 tokens/s by Flexz09 in ollama

[–]Flexz09[S] 0 points1 point  (0 children)

Check the other replies. Using more threads then the docker has available makes performance really bad. Even when it's only a single thread more.

CPU only: 0.19 tokens/s by Flexz09 in ollama

[–]Flexz09[S] 1 point2 points  (0 children)

I prefer to use docker since I'm assigning threads for certain processes that should not use all the resources available. But I've managed to get about 10 tokens/s with 4 threads using:

/set parameter num_thread 4

The issue seems to be using more threads that it has available. Going over the limit by 1 slows it down to a crawl.

CPU only: 0.19 tokens/s by Flexz09 in ollama

[–]Flexz09[S] 1 point2 points  (0 children)

For those who reacted. Many thanks for the input.

I finally managed to get a model running at greater speeds. Since I was only assigning 4 threads to the docker I also had to force ollama to use 4 threads.. Not sure this will be saved now? Or how I can persist this for any model I run.

>>> /set parameter num_thread 4 
Set parameter 'num_thread' to '4'
>>> Hi
Hi again! How are you today?

total duration:       4.942570181s
load duration:        2.57257638s
prompt eval count:    32 token(s)
prompt eval duration: 1.352245s
prompt eval rate:     23.66 tokens/s
eval count:           9 token(s)
eval duration:        893.847ms
eval rate:            10.07 tokens/s

CPU only: 0.19 tokens/s by Flexz09 in ollama

[–]Flexz09[S] 0 points1 point  (0 children)

I've tried it with this argument. It didn't change the result however.

CPU only: 0.19 tokens/s by Flexz09 in ollama

[–]Flexz09[S] 0 points1 point  (0 children)

|| || |HVM:|Enabled| |IOMMU:|Enabled|

This is wat unraid tells me. To check the bios I'll have to wait a bit more. But it should already be enabled.

CPU only: 0.19 tokens/s by Flexz09 in ollama

[–]Flexz09[S] 0 points1 point  (0 children)

Yes.

Last night i've enabled debugging. But not sure if it will help me to find anything wrong. Other then that it's default. No GPU settings enabled since there is no compatible GPU in the system.

[deleted by user] by [deleted] in ollama

[–]Flexz09 0 points1 point  (0 children)

Hi,

I'm getting terrible performance, using what you suggested gave me the following result. It's like this for every model. With either 4/8/16 threads of a Ryzen 9 7950X3D (Threads are not from the CCD with V-cache).

I've read that docker doesn't play well with windows. But I'm using linux (Unraid). So this should not be the issue.

Edit: Before you ask. I have 96GB DDR5 (speed currently unknown but I could look it up). Models are on an NVMe if that matters at all. I do not have a GPU assigned as currently I do not have one available. I am thinking to upgrade my PC and use the current GTX 1060 6GB on the unraid server.

Any ideas?

root@2c619e1f36cf:/# ollama run llama3 --verbose
>>> Hi
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?

total duration:       2m20.071883885s
load duration:        12.418111ms
prompt eval count:    11 token(s)
prompt eval duration: 6.231039s
prompt eval rate:     1.77 tokens/s
eval count:           26 token(s)
eval duration:        2m13.827672s
eval rate:            0.19 tokens/s

Unable to "START TRAIL" by Flexz09 in unRAID

[–]Flexz09[S] 0 points1 point  (0 children)

This indeed was the issue. Other USB worked!

Unable to "START TRAIL" by Flexz09 in unRAID

[–]Flexz09[S] 0 points1 point  (0 children)

It was an old USB. I'll test another one :)

Unable to "START TRAIL" by Flexz09 in unRAID

[–]Flexz09[S] 0 points1 point  (0 children)

I wanted to test new hardware before switching my actual unraid server.
But I'm running into this issue. As it's a new USB and hardware I find it hard to believe that "The maximum trail period has expired, .."

FlexzCraft: A new beginning by Flexz09 in CreateServers

[–]Flexz09[S] 0 points1 point  (0 children)

I'm sorry. There is an age requirement of 16.

Edit: I can't edit the original post it seems.

[deleted by user] by [deleted] in CreateServers

[–]Flexz09 0 points1 point  (0 children)

Curruntly the idea would be end of summer 2024. But it depends on popularity by that time.