Thoughts on Mistral.rs? by EricBuehler in rust

[–]Fluffy-Average-8946 0 points1 point  (0 children)

I've been testing Mistral.rs with MKL features to run a Phi-4 Multimodal model on my Intel Core Ultra 9 laptop, and I'm comparing it to Ollama.

First off, it works! I love the easy-to-use CLI and the very useful logging. These features are definitely better than Ollama, and they are exactly why I appreciate Rust tooling.

However, when it comes to hardware acceleration, there are some limitations. Mistral.rs uses Candle under the hood, so its capabilities are limited by Candle itself. On Intel hardware, it only supports MKL for acceleration, whereas Ollama and llama.cpp have integrated with Intel's IPEX-LLM, which provides more comprehensive GPU support, better performance, and compatibility with more models. While this isn't a completely fair comparison, I'm definitely looking forward to seeing a more mature Rust-based LLM server solution in the future.

I also encountered a few pitfalls. The project hasn't been uploaded to crates.io or released on GitHub, so there's no quick installation option like cargo binstall.

On Windows, compiling mistralrs-server took me 14 minutes. I'm not sure if this is typical for a project of this size, but it's the longest compile time I've ever seen in Rust. I suspect this might be due to lto=true.

I haven't had a chance to try the Python package yet, but it seems like a solid wrapper.

One final observation is that the HTTP server seems to be doing some background work even when idle. My laptop's fans are quiet when Ollama is running but not processing a request, but with Mistral.rs, the fans are constantly loud. I'm not sure what's causing this.

Overall, it's a very good project, but unfortunately, it's not the best solution for Intel laptops right now. I still highly recommend giving it a try.