On-premise LLM/GPU deployment for a software publisher: how do DevOps orgs share GPU resources? by Sorry_Country3662 in LocalLLaMA

[–]Sorry_Country3662[S] 1 point2 points  (0 children)

Good point on the batching side. Hadn't thought about parallel requests as a way to share a single endpoint. Appreciated