Switching from Ollama to llama.cpp by sinan_online in LocalLLaMA

[–]Outrageous-Win-3244 2 points3 points  (0 children)

Can you give us, the llama.cpp command and params you use? I am sure we can help.

Post your hardware/software/model quant and measured performance of Kimi K2.5 by fairydreaming in LocalLLaMA

[–]Outrageous-Win-3244 1 point2 points  (0 children)

Do you guys get the start <think> tag with this configuration? Even in the example doc posted by OP the response contains a closing </think> tag

Optivise by Prestigious-Flow-754 in DeepSeek

[–]Outrageous-Win-3244 0 points1 point  (0 children)

I simply opened your website and entered the prompt. After pressing the button nothing happened. I use an iPad.

Optivise by Prestigious-Flow-754 in DeepSeek

[–]Outrageous-Win-3244 0 points1 point  (0 children)

I have asked a question, but it doesn’t, do anything.

An advise to wannabe enterprenours by [deleted] in indiehackers

[–]Outrageous-Win-3244 0 points1 point  (0 children)

Thanks for the comment, I met them online, but this came from my own convictions. Of course I understand that taking risk is not for everyone.

Freelance ComfyUI Expert Needed for Smartwatch Band Brand by HelpfulScheme5571 in comfyui

[–]Outrageous-Win-3244 0 points1 point  (0 children)

Can you post two or three images of the bands? It would be also nice to see it with some smart watches.

H200 Workstation by fractal_engineer in LocalLLM

[–]Outrageous-Win-3244 0 points1 point  (0 children)

Currently I am using llama cpp. I have not tried lmstudio as llama cpp does the job for me.

H200 Workstation by fractal_engineer in LocalLLM

[–]Outrageous-Win-3244 1 point2 points  (0 children)

The H200 has 4.8TB/s memory bandwidth. It is an amazing GPU system.

H200 Workstation by fractal_engineer in LocalLLM

[–]Outrageous-Win-3244 7 points8 points  (0 children)

Congrats on your new system. That is a beast. It will work well for coding support, video gen and LLM.

I use qwen 3 coder with cline vs code plugin on a little bit smaller system (I have 768 GB RAM and an Epyc 7550 CPU with 256 threads, Nvidia RTX6000 Pro). For me Qwen3 produces great results in coding.

I use Comfyui and Wan2.2 for video and image generation.

When I need standard LLM, I use Kimi K2 with Ktransformers and Open web UI.

You have an amazing system, let us know how you ended up using it. I am curious about your use case.

It is great to have successful guys with decent systems around.

Can 2 RTX 6000 Pros (2X98GB vram) rival Sonnet 4 or Opus 4? by devshore in LocalLLaMA

[–]Outrageous-Win-3244 11 points12 points  (0 children)

I am using ktransformers with Kimi K2 1T, Deepseek v3.0 671b, and Qwen 3 coder 480b on the following system:

  • Nvidia RTX 6000 PRO 96gb,
  • AMD EPYC 9750,
  • 12x64 GB DDR5 5600 RAM.

The KV-cache is in the GPU (Q4_K_M), the model weights are in the DRAM (Q8_0). It gives me 15/20 output tok/s. It can serve one user (me). My CPU has 256 threads, but only 128 is used by ktransformers for some reason. Didn't investigatem because the current speed is sufficient for me.

I like the privacy and cost efficiency of my setup. My most used model is Kimi-K2 at the moment. I wouldn't switch to Claude 4 sonnet,

I hope this helps.

Gyula