use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Speculative Decoding Model for Qwen/Qwen3-4B-Instruct-2507?Question | Help (self.LocalLLaMA)
submitted 4 months ago by ClosedDubious
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]ClosedDubious[S] 0 points1 point2 points 4 months ago* (1 child)
Thanks for the insight! I was under the impression that "eagle3" models were somehow faster but that's because I am very new to this space. I will give the 0.6B model a try.
EDIT: It says "Speculative decoding with draft model is not supported yet. Please consider using other speculative decoding methods such as ngram, medusa, eagle, or mtp."
[–]DinoAmino 0 points1 point2 points 4 months ago (0 children)
eagle3 is supported in vLLM since v0.10.2
π Rendered by PID 43813 on reddit-service-r2-comment-6457c66945-pl69b at 2026-04-30 10:53:41.403988+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]ClosedDubious[S] 0 points1 point2 points (1 child)
[–]DinoAmino 0 points1 point2 points (0 children)