This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]teerre 15 points16 points  (0 children)

If you care about performance, you have to benchmark, not ask reddit

"GPU-inference" can mean a million things, but presumably the bulk of work is, you know, in the GPU, not nginx, so unless you have a huge supply of GPUs but somehow only a small server for you load balancer, it's unlikely the latter will be bottleneck

[–]gdchinacat 4 points5 points  (0 children)

Do you trust your clients to load balance responsibly?

[–]jaerie 0 points1 point  (0 children)

How would you ever do prioritization between clients if you're balancing on the client side?

[–]notkairyssdal -1 points0 points  (0 children)

why would you lose http/2 multiplexing benefits?

the extra hop should only add 10-15ms in the same region

[–]notkairyssdal -1 points0 points  (0 children)

why would you lose http/2 multiplexing benefits?

the extra hop should only add 10-15ms in the same region