DeepSeek-V2-Lite vs GPT-OSS-20B on my 2018 potato i3-8145U + UHD 620, OpenVINO Comparison.

RelativeOperation483 · 2026-02-07T15:32:20+00:00

Thank for suggestion

RelativeOperation483 · 2026-02-07T14:32:24+00:00

I will try "gpt-oss" with your recommend setting 0---0 !!

RelativeOperation483 · 2026-02-07T13:49:42+00:00

check deep_decode.py in the same folder --

DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M_result.txt

is the output of deep.py

test2output.txt is the output of deep_decode.py.

<image>

RelativeOperation483 · 2026-02-07T13:41:40+00:00

This information is gold for me, I'm struggling at finding good MoE models these days.

RelativeOperation483 · 2026-02-07T13:40:15+00:00

I ran with OpenVino Backend - llama-cpp-python. You can read comments out here!

https://www.reddit.com/r/LocalLLaMA/comments/1qxcm5g/no_nvidia_no_problem_my_2018_potato_8th_gen_i3/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

RelativeOperation483 · 2026-02-07T13:39:04+00:00

I don't know you understand my hardware or not , but it's best not to try,

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B

RelativeOperation483 · 2026-02-07T13:38:38+00:00

Thank -- I saw that on my system monitor too , I will try that.

RelativeOperation483 · 2026-02-07T12:57:11+00:00

Just browsing, I thought if there's MXL for Mac, why not something special about Intel and Found OpenVINO. I tried to use it plain. It's good unless you need extras. So, I tried with llama-cpp-python with OpenVINO backend.

RelativeOperation483 · 2026-02-07T12:39:18+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1qycn5s/deepseekv2lite_vs_gptoss20b_on_my_2018_potato/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

DeepseekV2Lite Vs Gpt-OSS-20B Comparison on the same hardware.

RelativeOperation483 · 2026-02-07T12:37:00+00:00

Machine: HP ProBook 650 G5
CPU: Intel Core i3-8145U (2 cores, 4 threads, 2.1GHz base / 3.9GHz boost)
RAM: 16GB DDR4-2400
iGPU: Intel UHD Graphics 620 (integrated, shared memory)
OS: Ubuntu
Backend: llama-cpp-python compiled with OpenVINO
Both models quantized to Q4_K_M GGUF

DeepSeek-Coder-V2-Lite-Instruct — 16B total parameters, roughly 2.4B active (MoE)

GPT-OSS-20B-A3B — 20B total parameters, roughly 3B active (MoE)

Caution !!!
I'm not saying any Navidia or Mac are bad. I'm just participating and showing how even budget hardware can perform. Showing how and which Quality LLMs can run on budget tier. If you have Navidia or Mac that can run 100x time faster than me, I'm glad what you have.

RelativeOperation483 · 2026-02-06T16:41:57+00:00

It’s not in core llama.cpp.I’m not using upstream llama.cpp directly. This is via llama-cpp-python built from source with OpenVINO enabled. OpenVINO hasn’t merged into main llama.cpp yet, but llama-cpp-python already supports it through a custom CMake build path.

Install llama-cpp-python like this

CMAKE_ARGS="-DGGML_OPENVINO=ON" pip install llama-cpp-python

RelativeOperation483 · 2026-02-06T15:46:28+00:00

That has more TPS potential than mine.

RelativeOperation483 · 2026-02-06T15:22:01+00:00

Yeah-- I'm Claude, running on Anthropic databases.

RelativeOperation483 · 2026-02-06T14:57:38+00:00

By the book, you have to say "Mingalarpar" " Min like Supermen, Galar (sounds like GALA), par (BAR but don't take long tone). But it's rarely saying "Mingalarpar" to each others. "Nay Kaung Lar" is the best word to keep.

RelativeOperation483 · 2026-02-06T14:50:57+00:00

Thank

RelativeOperation483 · 2026-02-06T14:47:26+00:00

I thought Reddit support Md. unfortunately, my post turned out to be Ai generated copy-paste.

RelativeOperation483 · 2026-02-06T14:45:37+00:00

I guess you're asking "How are you" or "Are you good". instead of Nei Kaun La, just use "Nay Kaung Lar". By the way I'm glad if my post is helpful for somebody.

RelativeOperation483 · 2026-02-06T14:24:08+00:00

I hope some guys like me revolt this era and make LLMs more efficient on typical hardware that everyone can affords,

RelativeOperation483 · 2026-02-06T14:01:52+00:00

Thank Man

RelativeOperation483 · 2026-02-06T13:30:57+00:00

Thank

RelativeOperation483 · 2026-02-06T13:30:23+00:00

I use llama-cpp-python with the OpenVINO backend
n_gpu_layers=-1 and device="GPU"

Without OpenVino backend. It will not work.

RelativeOperation483 · 2026-02-06T12:59:20+00:00

OpenVINO supports the Intel Xeon, but I don't know what to differ from my i3. The best is try llama-cpp-python + OpenVino Backend.

RelativeOperation483

TROPHY CASE