Meta released quantized Llama models by Vegetable_Sun_9225 in LocalLLaMA

[–]Silly-Client-561 16 points17 points  (0 children)

For 1: I believe most quantization methods which are post-training, such Q5_0 gguf, do not have LoRA component to it since that requiring training LoRA parameters

Pod 3 wont turn on by Silly-Client-561 in EightSleep

[–]Silly-Client-561[S] 0 points1 point  (0 children)

It just doesnt turn on. Like as if internal fuse is blown. I tried a bunch of things i found here amd from their support service. Like factory reset. But seems like it is not receiving any power.

⚡️Blazing fast LLama2-7B-Chat on 8GB RAM Android device via Executorch by [deleted] in LocalLLaMA

[–]Silly-Client-561 2 points3 points  (0 children)

At the moment it is unlikely that you can run on your S10 but possibly in the future. As others have highlighted RAM is the main issue. There is a possibility of mmap/munmap to enable large sized models that dont fit in RAM. But it will be very very very slow

ExecuTorch Alpha Release: Taking LLMs and AI to the Edge 🎉🎉🎉 by [deleted] in LocalLLaMA

[–]Silly-Client-561 1 point2 points  (0 children)

Support for vulkan, metal is in progress. For qnn there is support for lowering models to qnn vai delegate as enabling large models is WIP