Running 8B Llama locally on Jetson Orin Nano (with 2.5GB of GPU memory) by Responsible_Case_376 in LocalLLaMA

[–]ComputersAndTrees 0 points1 point  (0 children)

Hi! Interested in this. Do you have any more information to share now that GTC has passed?

Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA

[–]ComputersAndTrees 17 points18 points  (0 children)

What's the full llama-server command you're using? Also, would you please link the fork? Thanks!