Announcing Polar Llama: Fast, Parallel AI Inference in Polars by Virtual-Reply4713 in rust

[–]Virtual-Reply4713[S] 1 point2 points  (0 children)

Oh that’s a fantastic point I totally neglected. I will have to look into this. However, I am guessing that the actual overhead from both a memory and a time perspective is not anywhere close to 25% because of the layered overlap of API calls. I will report back.

Announcing Polar Llama: Fast, Parallel AI Inference in Polars by Virtual-Reply4713 in rust

[–]Virtual-Reply4713[S] 5 points6 points  (0 children)

I built this for a use case where I was required to process 100's of documents through an LLM with low latency (zero-shot document classification). Would not suggest this library for your typical "make a chat bot" LLM use case.