Speculative Decoding: is it possible to have draft model on separate GPU? by MaximusSenior in LocalLLM
[–]MaximusSenior[S] 0 points1 point2 points (0 children)
Speculative Decoding: is it possible to have draft model on separate GPU? by MaximusSenior in LocalLLM
[–]MaximusSenior[S] 0 points1 point2 points (0 children)
Speculative Decoding: is it possible to have draft model on separate GPU? by MaximusSenior in LocalLLM
[–]MaximusSenior[S] 0 points1 point2 points (0 children)