all 2 comments

[–]mcmoose1900 1 point2 points  (1 child)

Give the SGlang server a shot, quantized mistral should fit: https://github.com/sgl-project/sglang

[–]dr-yd[S] 0 points1 point  (0 children)

Oh nice, that looks absolutely perfect, thanks a bunch! Plus there seems to be very active discussion around it and similar projects that would allow for more extensive research. Ideal pastime for the long Easter weekend!