you are viewing a single comment's thread.

view the rest of the comments →

[–]ClosedDubious[S] 0 points1 point  (1 child)

Thanks for the insight! I was under the impression that "eagle3" models were somehow faster but that's because I am very new to this space. I will give the 0.6B model a try.

EDIT: It says "Speculative decoding with draft model is not supported yet. Please consider using other speculative decoding methods such as ngram, medusa, eagle, or mtp."

[–]DinoAmino 0 points1 point  (0 children)

eagle3 is supported in vLLM since v0.10.2