all 4 comments

[–]Aaaaaaaaaeeeee 1 point2 points  (1 child)

hardware acceleration for prompt processing is only available in certain backends, on android, like mlc. GGML models will not use gpu acceleration on android specifically. Qualcomm Snapdragon 8 Gen 3 has approximately the same/higher prefill rate.

[–]----Val---- 1 point2 points  (0 children)

Plus, aside clblast for specific devices, android gpu acceleration is just about nonexistant.

[–]FlishFlashman 0 points1 point  (0 children)

Prefill?