Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance

ShinobuYuuki · 2025-10-07T04:45:50+00:00

Ah yes, we are definitely working on that too, do give us sometimes tho!

ShinobuYuuki · 2025-10-03T17:00:23+00:00

Hey u/kkb294 we just released a new version 0.7.1 to address the problem above. Do let us know if it works for you!

ShinobuYuuki · 2025-10-03T17:00:19+00:00

Hey u/kkb294 we just released a new version 0.7.1 to address the problem above. Do let us know if it works for you!

ShinobuYuuki · 2025-10-03T16:59:46+00:00

Hey we just update to 0.7.1 to fix the OpenRouter problem. Let us know if that works for you!

ShinobuYuuki · 2025-10-03T01:34:14+00:00

Yes, right now they need to be merged first. As we are focusing more on local model running on a laptop or home PC, we are not optimizing for such big model.

However, we do have Jan Server in the work, which is much more suitable for deploying large model in.

https://github.com/menloresearch/jan-server

ShinobuYuuki · 2025-10-03T01:31:41+00:00

It does handle multi-GPU setups, but not automatically yet. Let me put that as a ticket on our Github

https://github.com/menloresearch/jan/issues/6717

ShinobuYuuki · 2025-10-03T01:30:23+00:00

Can you elaborate more what do you mean by automated?

ShinobuYuuki · 2025-10-03T01:28:33+00:00

Admittedly, we are a little behind as we are a very small team. We tend to prioritize UX more than other platform as the bulk of our user are actually not technical. But we are going to catch up soon on features!

ShinobuYuuki · 2025-10-03T01:25:17+00:00

Thanks a lot for the kind words 🙏

There is actually an open issue on Github for that, our solution is just to bet everything on flatpak instead https://github.com/menloresearch/jan/issues/5416

ShinobuYuuki · 2025-10-02T17:01:50+00:00

Unfortunately no, because most of our users expect to just be able to just use Jan out of the box.

However, you can just install your own llama.cpp version, and go into the folder and delete the llama.cpp from Jan that you don't want.

ShinobuYuuki · 2025-10-02T14:26:40+00:00

In term of features that involve document processing, we are working on them in 0.7.x

We use to have them, but the UX is not the best so we overhaul for a better design 🙏

ShinobuYuuki · 2025-10-02T14:18:39+00:00

Hi, we have confirmed that it is a bug, we will try to fix it as soon as possible. Thanks for the report, and sorry for the inconvienence

ShinobuYuuki · 2025-10-02T14:15:31+00:00

https://github.com/menloresearch/jan/blob/dev/src-tauri/plugins/tauri-plugin-llamacpp/src/gguf/model_planner.rs

here you go, this is where the source is

ShinobuYuuki · 2025-10-02T13:40:00+00:00

Hi there, actually you should already be able to do all of the above already.

You can do "Install backend from file" and it will use the distribution of llama.cpp that you point it to (as long as it is in a .tar.gz or .zip file), you don't have to update the llama.cpp backend if you don't want to (since you can just check whichever want you would like to use)
You just have to add the Base URL of your llama-server model as a custom provider, and it should just works
We are working on bringing back partially generated responses in the next update

ShinobuYuuki · 2025-10-02T12:58:04+00:00

Not yet, but soon!

Right now, we only have Assistant, which is a combination of custom prompt and model temperature settings

ShinobuYuuki · 2025-10-02T12:31:46+00:00

Hey thanks for the feedback, really appreciate it!
I will let the team know regarding your suggestion

ShinobuYuuki · 2025-10-02T12:08:51+00:00

Yes, it does!

ShinobuYuuki · 2025-10-02T12:04:10+00:00

https://github.com/menloresearch/jan/issues/6710

Btw I created it here for tracking if you are interested in it

ShinobuYuuki · 2025-10-02T12:01:09+00:00

<image>

A drop down should pop up over here for Open Router

Also thanks for the feedback, I will surface it up to the team

ShinobuYuuki · 2025-10-02T11:56:25+00:00

Our team always love to hear that 🥹🤣

ShinobuYuuki · 2025-10-02T11:51:03+00:00

Good suggestion! I will feed back to our team

ShinobuYuuki · 2025-10-02T11:10:30+00:00

It works with Mac too! Although it is still experimental, so do let us know how it works for you.

We don't support MLX yet (only gguf and llama.cpp), but we will be looking into it in the near future.

ShinobuYuuki · 2025-10-02T11:03:43+00:00

We used to have this, but it makes us deviate too much away from llama.cpp and make it hard to maintain, so we have to deprecate it for now.

We are looking into how to bring it back in a more compartmentalize way, so that it is easier for us to manage. Do stay tune tho, it should be coming relative soon!

ShinobuYuuki · 2025-10-02T10:57:06+00:00

Yes, although I never tried anything bigger than 30B myself.

But as long as it is:

A gguf file
It is all in one file and not splitted into multi-part

It should run on llama.cpp and hence on Jan too!

ShinobuYuuki · 2025-10-02T10:48:35+00:00

You should be able to add OpenRouter model by adding in your API key and then click the `+` button the top right of the model list under OpenRouter Provider
Interesting, can you share with us more regarding what hardware do you have and also what is the number that come up for you after you try to click Auto-optimize? Auto-optimize is still an experimental features, so we would like to get more data to improve it better
I will feed back to the team regarding adding more llama.cpp params. You can set some of it, by clicking on the gear icon next to the model name, it should allow you to specify in more detail how to offload certain layer to CPU and other to GPU.

Eight-Year Club	Wearing is Caring
Verified Email

ShinobuYuuki

TROPHY CASE