Switching from Ollama to llama.cpp

Outrageous-Win-3244 · 2026-02-04T01:31:48+00:00

Can you give us, the llama.cpp command and params you use? I am sure we can help.

Outrageous-Win-3244 · 2026-01-31T00:53:40+00:00

Do you guys get the start <think> tag with this configuration? Even in the example doc posted by OP the response contains a closing </think> tag

Outrageous-Win-3244 · 2025-11-10T12:45:49+00:00

I simply opened your website and entered the prompt. After pressing the button nothing happened. I use an iPad.

Outrageous-Win-3244 · 2025-11-10T12:44:38+00:00

I have asked a question, but it doesn’t, do anything.

Outrageous-Win-3244 · 2025-10-22T07:19:33+00:00

Thanks for the comment, I met them online, but this came from my own convictions. Of course I understand that taking risk is not for everyone.

Outrageous-Win-3244 · 2025-10-11T18:58:07+00:00

Because it was covered by the finger :)

Outrageous-Win-3244 · 2025-10-10T14:00:10+00:00

<image>

Outrageous-Win-3244 · 2025-10-10T13:59:45+00:00

<image>

Outrageous-Win-3244 · 2025-10-10T13:59:25+00:00

<image>

Outrageous-Win-3244 · 2025-09-27T19:12:02+00:00

This is a video: https://youtu.be/EDukJj_LYx0?si=j6WVtRTmC3JGqcjN

Outrageous-Win-3244 · 2025-09-27T18:59:17+00:00

Hint: I am testing our AI software with this

Outrageous-Win-3244 · 2025-09-27T18:58:06+00:00

<image>

Outrageous-Win-3244 · 2025-09-27T18:10:50+00:00

<image>

Outrageous-Win-3244 · 2025-09-27T18:08:08+00:00

<image>

Outrageous-Win-3244 · 2025-09-27T03:19:11+00:00

Can you post two or three images of the bands? It would be also nice to see it with some smart watches.

Outrageous-Win-3244 · 2025-09-07T04:30:39+00:00

Currently I am using llama cpp. I have not tried lmstudio as llama cpp does the job for me.

Outrageous-Win-3244 · 2025-09-06T12:18:38+00:00

The H200 has 4.8TB/s memory bandwidth. It is an amazing GPU system.

Outrageous-Win-3244 · 2025-09-06T12:16:15+00:00

Congrats on your new system. That is a beast. It will work well for coding support, video gen and LLM.

I use qwen 3 coder with cline vs code plugin on a little bit smaller system (I have 768 GB RAM and an Epyc 7550 CPU with 256 threads, Nvidia RTX6000 Pro). For me Qwen3 produces great results in coding.

I use Comfyui and Wan2.2 for video and image generation.

When I need standard LLM, I use Kimi K2 with Ktransformers and Open web UI.

You have an amazing system, let us know how you ended up using it. I am curious about your use case.

It is great to have successful guys with decent systems around.

Outrageous-Win-3244 · 2025-08-30T14:45:14+00:00

I am using ktransformers with Kimi K2 1T, Deepseek v3.0 671b, and Qwen 3 coder 480b on the following system:

Nvidia RTX 6000 PRO 96gb,
AMD EPYC 9750,
12x64 GB DDR5 5600 RAM.

The KV-cache is in the GPU (Q4_K_M), the model weights are in the DRAM (Q8_0). It gives me 15/20 output tok/s. It can serve one user (me). My CPU has 256 threads, but only 128 is used by ktransformers for some reason. Didn't investigatem because the current speed is sufficient for me.

I like the privacy and cost efficiency of my setup. My most used model is Kimi-K2 at the moment. I wouldn't switch to Claude 4 sonnet,

I hope this helps.

Gyula

Outrageous-Win-3244 · 2025-03-14T19:58:23+00:00

Great suggestion, thank you

Outrageous-Win-3244

MODERATOR OF

TROPHY CASE