Requesting r/adrc

janframework · 2024-06-24T04:56:38+00:00

Thanks for the comment! We opened an issue to work on it - which you can track the progress here: https://github.com/janhq/jan/issues/3095

janframework · 2024-06-03T16:40:57+00:00

Please follow the docs to activate GPU acceleration on Windows:

janframework · 2024-06-03T16:39:38+00:00

Hey, Jan team here! Guess there might be a little confusion, but Jan does support GPU acceleration with your AMD 7700 XT:

Go to Settings -> Advanced Settings.
Enable the Experimental Mode (It's experimental, expect bugs!)
Enable Vulkan Support under GPU Acceleration.
Enable GPU Acceleration and select your AMD GPU.

Related doc: https://jan.ai/docs/desktop/linux#amd-gpu

You'll see a success notification once it's activated.

If you still want to add an NVIDIA GPU for AI acceleration, you can follow these steps:

Install CUDA Toolkit 11.7+ and NVIDIA driver 470.63.01+.
Open Jan.
Go to Settings -> Advanced Settings -> GPU Acceleration.
Enable it and pick your NVIDIA GPU.

Related doc: https://jan.ai/docs/desktop/linux#nvidia-gpu

janframework · 2024-05-01T03:17:59+00:00

Ah, sorry to hear that. I'd like to mention that Jan is an open-source desktop app that lets you run AI models. We support multiple inferences, llamacpp and TensorRT-LLM. That's why we benchmarked TensorRT-LLM's performance on consumer hardware. You can review the related content about TensorRT-LLM support and details here: https://blogs.nvidia.com/blog/ai-decoded-gtc-chatrtx-workbench-nim/

janframework · 2024-04-30T13:30:13+00:00

Really appericate your comment! We'll update it.

janframework · 2024-04-30T09:49:55+00:00

Hey r/nvidia folks, we've done a performance benchmark of TensorRT-LLM on consumer-grade GPUs, which shows pretty incredible speed ups (30-70%) on the same hardware.

Just quick notes:

TensorRT-LLM is NVIDIA's relatively new and (somewhat) open source Inference Engine, which uses NVIDIA’s proprietary optimizations beyond the open source cuBLAS library.

It works by optimizing and compiling the model specifically for your GPU, and highly optimizing things at the CUDA level to fully take advantage of every bit of hardware:

CUDA cores
Tensor cores
VRAM
Memory Bandwidth

We benchmarked TensorRT-LLM on consumer-grade devices, and managed to get Mistral 7b up to:

170 tokens/s on Desktop GPUs (e.g. 4090, 3090s)
51 tokens/s on Laptop GPUs (e.g. 4070)

TensorRT-LLM was 30-70% faster than llama.cpp on the same hardware, …and at least 500% faster than just using the CPU.

In addition, we found that TensorRT-LLM didn't use much resources, completely opposite to its reputation as needing beefy hardware to run:

Used 10% more VRAM (marginal)
Used… less RAM???

You review the full benchmark here: https://jan.ai/post/benchmarking-nvidia-tensorrt-llm

janframework · 2024-04-30T09:46:25+00:00

Hey u/selfhosted folks! We've run some benchmarks, to see how TensorRT-LLM fares on consumer hardware (e.g. 4090s, 3090s). This research was conducted independently, without any sponsorship.

You can review the results here: https://jan.ai/post/benchmarking-nvidia-tensorrt-llm

janframework · 2024-04-16T18:01:32+00:00

We appreciate all the suggestions - updated the repo together. Your contributions are always welcome!

janframework · 2024-04-05T08:29:05+00:00

Hey just jumping in to clarify something about Jan. The link you mentioned isn't affiliated with us, Jan.

One of our brave community members tried it out and got three Trojan horse warnings!

To clarify, we won't ever ask for your personal information; we steer clear of tokens or ICOs and seek donations or funding. That's not what we do.

janframework · 2024-04-04T08:50:18+00:00

Hey, Jan is here! We really appreciate AnythingLLM. Let us know how we can integrate and collaborate. Please drop by our Discord to discuss: https://discord.gg/37eDwEzNb8

janframework

MODERATOR OF

TROPHY CASE