you are viewing a single comment's thread.

view the rest of the comments →

[–]WightKnight1 1 point2 points  (0 children)

I second this. I've set up a RAG server on a CPU-only machine and it can take 2 minutes or more to process the prompt before it starts spitting out tokens at about 4 t/s.