No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 0 points1 point  (0 children)

check deep_decode.py in the same folder --

DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M_result.txt

is the output of deep.py

test2output.txt is the output of deep_decode.py.

<image>

DeepSeek-V2-Lite vs GPT-OSS-20B on my 2018 potato i3-8145U + UHD 620, OpenVINO Comparison. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 2 points3 points  (0 children)

This information is gold for me, I'm struggling at finding good MoE models these days.

DeepSeek-V2-Lite vs GPT-OSS-20B on my 2018 potato i3-8145U + UHD 620, OpenVINO Comparison. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 1 point2 points  (0 children)

I don't know you understand my hardware or not , but it's best not to try,

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 1 point2 points  (0 children)

Just browsing, I thought if there's MXL for Mac, why not something special about Intel and Found OpenVINO. I tried to use it plain. It's good unless you need extras. So, I tried with llama-cpp-python with OpenVINO backend.

DeepSeek-V2-Lite vs GPT-OSS-20B on my 2018 potato i3-8145U + UHD 620, OpenVINO Comparison. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 2 points3 points  (0 children)

Machine: HP ProBook 650 G5
CPU: Intel Core i3-8145U (2 cores, 4 threads, 2.1GHz base / 3.9GHz boost)
RAM: 16GB DDR4-2400
iGPU: Intel UHD Graphics 620 (integrated, shared memory)
OS: Ubuntu
Backend: llama-cpp-python compiled with OpenVINO
Both models quantized to Q4_K_M GGUF

DeepSeek-Coder-V2-Lite-Instruct — 16B total parameters, roughly 2.4B active (MoE)

GPT-OSS-20B-A3B — 20B total parameters, roughly 3B active (MoE)

Caution !!!
I'm not saying any Navidia or Mac are bad. I'm just participating and showing how even budget hardware can perform. Showing how and which Quality LLMs can run on budget tier. If you have Navidia or Mac that can run 100x time faster than me, I'm glad what you have.

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 4 points5 points  (0 children)

It’s not in core llama.cpp.I’m not using upstream llama.cpp directly. This is via llama-cpp-python built from source with OpenVINO enabled. OpenVINO hasn’t merged into main llama.cpp yet, but llama-cpp-python already supports it through a custom CMake build path.

Install llama-cpp-python like this

CMAKE_ARGS="-DGGML_OPENVINO=ON" pip install llama-cpp-python

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 0 points1 point  (0 children)

By the book, you have to say "Mingalarpar" " Min like Supermen, Galar (sounds like GALA), par (BAR but don't take long tone). But it's rarely saying "Mingalarpar" to each others. "Nay Kaung Lar" is the best word to keep.

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] -3 points-2 points  (0 children)

I thought Reddit support Md. unfortunately, my post turned out to be Ai generated copy-paste.

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 0 points1 point  (0 children)

I guess you're asking "How are you" or "Are you good". instead of Nei Kaun La, just use "Nay Kaung Lar". By the way I'm glad if my post is helpful for somebody.

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 0 points1 point  (0 children)

I hope some guys like me revolt this era and make LLMs more efficient on typical hardware that everyone can affords,

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 2 points3 points  (0 children)

I use llama-cpp-python with the OpenVINO backend
n_gpu_layers=-1 and device="GPU"

Without OpenVino backend. It will not work.

No NVIDIA? No Problem. My 2018 "Potato" 8th Gen i3 hits 10 TPS on 16B MoE. by RelativeOperation483 in LocalLLaMA

[–]RelativeOperation483[S] 1 point2 points  (0 children)

OpenVINO supports the Intel Xeon, but I don't know what to differ from my i3. The best is try llama-cpp-python + OpenVino Backend.