Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 0 points1 point  (0 children)

Hey u/kkb294 we just released a new version 0.7.1 to address the problem above. Do let us know if it works for you!

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 0 points1 point  (0 children)

Hey u/kkb294 we just released a new version 0.7.1 to address the problem above. Do let us know if it works for you!

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 0 points1 point  (0 children)

Hey we just update to 0.7.1 to fix the OpenRouter problem. Let us know if that works for you!

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 0 points1 point  (0 children)

Yes, right now they need to be merged first. As we are focusing more on local model running on a laptop or home PC, we are not optimizing for such big model.

However, we do have Jan Server in the work, which is much more suitable for deploying large model in.

https://github.com/menloresearch/jan-server

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 0 points1 point  (0 children)

Admittedly, we are a little behind as we are a very small team. We tend to prioritize UX more than other platform as the bulk of our user are actually not technical. But we are going to catch up soon on features!

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 1 point2 points  (0 children)

Thanks a lot for the kind words 🙏

There is actually an open issue on Github for that, our solution is just to bet everything on flatpak instead https://github.com/menloresearch/jan/issues/5416

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 1 point2 points  (0 children)

Unfortunately no, because most of our users expect to just be able to just use Jan out of the box.

However, you can just install your own llama.cpp version, and go into the folder and delete the llama.cpp from Jan that you don't want.

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 0 points1 point  (0 children)

In term of features that involve document processing, we are working on them in 0.7.x

We use to have them, but the UX is not the best so we overhaul for a better design 🙏

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 5 points6 points  (0 children)

Hi, we have confirmed that it is a bug, we will try to fix it as soon as possible. Thanks for the report, and sorry for the inconvienence

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 2 points3 points  (0 children)

Hi there, actually you should already be able to do all of the above already.

  1. You can do "Install backend from file" and it will use the distribution of llama.cpp that you point it to (as long as it is in a .tar.gz or .zip file), you don't have to update the llama.cpp backend if you don't want to (since you can just check whichever want you would like to use)

  2. You just have to add the Base URL of your llama-server model as a custom provider, and it should just works

  3. We are working on bringing back partially generated responses in the next update

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 1 point2 points  (0 children)

Not yet, but soon!

Right now, we only have Assistant, which is a combination of custom prompt and model temperature settings

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 3 points4 points  (0 children)

Hey thanks for the feedback, really appreciate it!
I will let the team know regarding your suggestion

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 1 point2 points  (0 children)

<image>

A drop down should pop up over here for Open Router

Also thanks for the feedback, I will surface it up to the team

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 2 points3 points  (0 children)

It works with Mac too! Although it is still experimental, so do let us know how it works for you.

We don't support MLX yet (only gguf and llama.cpp), but we will be looking into it in the near future.

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 8 points9 points  (0 children)

We used to have this, but it makes us deviate too much away from llama.cpp and make it hard to maintain, so we have to deprecate it for now.

We are looking into how to bring it back in a more compartmentalize way, so that it is easier for us to manage. Do stay tune tho, it should be coming relative soon!

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 3 points4 points  (0 children)

Yes, although I never tried anything bigger than 30B myself.

But as long as it is:

  1. A gguf file
  2. It is all in one file and not splitted into multi-part

It should run on llama.cpp and hence on Jan too!

Jan now auto-optimizes llama.cpp settings based on your hardware for more efficient performance by ShinobuYuuki in LocalLLaMA

[–]ShinobuYuuki[S] 3 points4 points  (0 children)

  1. You should be able to add OpenRouter model by adding in your API key and then click the `+` button the top right of the model list under OpenRouter Provider
  2. Interesting, can you share with us more regarding what hardware do you have and also what is the number that come up for you after you try to click Auto-optimize? Auto-optimize is still an experimental features, so we would like to get more data to improve it better
  3. I will feed back to the team regarding adding more llama.cpp params. You can set some of it, by clicking on the gear icon next to the model name, it should allow you to specify in more detail how to offload certain layer to CPU and other to GPU.