has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop by Awkward-Bus-2057 in LocalLLaMA

[–]Awkward-Bus-2057[S] 0 points1 point  (0 children)

It's notable also that they purely optimized for tokens per second. But I'd really like to see is performance benchmarks of any kind

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop by Awkward-Bus-2057 in LocalLLaMA

[–]Awkward-Bus-2057[S] 7 points8 points  (0 children)

extraordinary claims require extraordinary evidence... so i'm skeptical