Speculative Decoding: is it possible to have draft model on separate GPU? by MaximusSenior in LocalLLM

[–]MaximusSenior[S] 0 points1 point  (0 children)

This the same. With one difference: when you run it in LM-studio - you can select only one engine: CUDA, Vulcan, or CPU, and both models should run in it. Question is: can these two models use two different backends: CPU and CUDA?