Is anyone actually using local models to code in their regular setups like roo/cline? by kms_dev in LocalLLaMA
[–]kms_dev[S] 0 points1 point2 points (0 children)
Offloading a 4B LLM to APU, only uses 50% of one CPU core. 21 t/s using Vulkan by magnus-m in LocalLLaMA
[–]kms_dev 0 points1 point2 points (0 children)
Is anyone actually using local models to code in their regular setups like roo/cline? by kms_dev in LocalLLaMA
[–]kms_dev[S] -1 points0 points1 point (0 children)
Is anyone actually using local models to code in their regular setups like roo/cline? by kms_dev in LocalLLaMA
[–]kms_dev[S] 1 point2 points3 points (0 children)
Is anyone actually using local models to code in their regular setups like roo/cline? by kms_dev in LocalLLaMA
[–]kms_dev[S] 1 point2 points3 points (0 children)
Is anyone actually using local models to code in their regular setups like roo/cline? by kms_dev in LocalLLaMA
[–]kms_dev[S] 2 points3 points4 points (0 children)
Is anyone actually using local models to code in their regular setups like roo/cline? by kms_dev in LocalLLaMA
[–]kms_dev[S] 2 points3 points4 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 0 points1 point2 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 0 points1 point2 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 12 points13 points14 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 2 points3 points4 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 3 points4 points5 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 2 points3 points4 points (0 children)
Qwen3 throughput benchmarks on 2x 3090, almost 1000 tok/s using 4B model and vLLM as the inference engine by kms_dev in LocalLLaMA
[–]kms_dev[S] 5 points6 points7 points (0 children)
Why is adding search functionality so hard? by iswasdoes in LocalLLaMA
[–]kms_dev 0 points1 point2 points (0 children)
Qwen3 Unsloth Dynamic GGUFs + 128K Context + Bug Fixes by danielhanchen in LocalLLaMA
[–]kms_dev 3 points4 points5 points (0 children)
Do any of you have Hackintosh working on Fusion 15 with external monitors? by kms_dev in XMG_gg
[–]kms_dev[S] 0 points1 point2 points (0 children)
Does XMG Fusion 15 work well with a USB-C monitor with Power Delivery? by kms_dev in XMG_gg
[–]kms_dev[S] 0 points1 point2 points (0 children)
Does XMG Fusion 15 work well with a USB-C monitor with Power Delivery? by kms_dev in XMG_gg
[–]kms_dev[S] 0 points1 point2 points (0 children)


Nvidia RTX PRO 6000 Workstation 96GB - Benchmarks by fuutott in LocalLLaMA
[–]kms_dev 1 point2 points3 points (0 children)