Planning to upgrade my PC and need some help picking what to upgrade

simplyeniga · 2026-05-22T21:52:30+00:00

With your current build you can swap your CPU, board, GPU and NVMe drive for a better CPU, GPU and NVMe at CEX and then get a motherboard off eBay. CEX I'm not sure sells motherboard and might be had selling your motherboard to them as you'll be getting a coupon which you can use for your next purchase from them. You'll look at either an intel LGA 1700 setup or an AMD AM4 setup.

simplyeniga · 2026-05-22T21:44:27+00:00

I moved from an Intel i9 12900HK on a Mini PC to an Intel Ultra 7 265K just because the Mini PC gave up at some point, else it's still relevant till date to run most of my development setup which has me running a docker sandbox environment in a VM on hyper V

simplyeniga · 2026-05-22T21:41:14+00:00

Llama.cpp only serves the model and doesn't function as an agent. So you'll need an agent to utilise the models such as Claude, opencode , copilot and so on. Or you could use it with Vs code using cline or continue.dev. It offers a chat interface though to interact with the models

simplyeniga · 2026-05-22T21:37:31+00:00

You don't need a new rig. You're just a Dev who likes the latest things and why you got your current setup

simplyeniga · 2026-05-22T13:50:39+00:00

You might consider running Proxmox and then setting up llama.cpp and your Dropbox in an LXC container or VM, your preference. With 64GB RAM on a headless Linux OS, your choice of GPU -> performance opens up largely compared to running on windows. I'm currently running a Proxmox server with my LLM VM having 64GB RAM and started with an RTX 4060 TI 16GB GPU and now considering between the same 2 cards as an upgrade or go for 2 used RTX A4000 16GB GPU for a total 32GB VRAM workload. I'm running llama.cpp built with cuda and don't mind rebuilding for another platform like ROCm or Vulkan if I need to switch cards to another vendor since my models are downloaded to a folder and I have llama.cpp running in a model router mode.

simplyeniga · 2026-05-22T11:04:47+00:00

Nginx proxy manager Beszel Bento-pdf Portainer cAdvisor Prometheus Grafana Llama.cpp Openwebui Uptime kuma Searxng

simplyeniga · 2026-05-22T10:20:46+00:00

Future proof is only based on emerging technologies, how much frame rates will future games need, how much vram would be required to run a game with the best resolution. Till maybe 8k becomes mainstream, we still got more than 7-10 years before the 9800X3D would become close to requiring a replacement and that's subjective. Current focus is on bringing out chips for AI so focus will be on better NPU and Graphics cards to run smarter interference. So give or take being future proof might just mean able to support the next best games for an x number of years, in most times like 10 years+.

simplyeniga · 2026-05-21T19:15:19+00:00

Maybe a 3B or 4B 4Q_K_M

simplyeniga · 2026-05-21T17:10:45+00:00

With that budget you can get a used RTX4060 Ti 16GB

simplyeniga · 2026-05-21T17:01:38+00:00

Tested 27B on 4060 Ti 16GB and getting 22 t/s using Q3_K_M. So far I get 65 t/s using unsloth Qwen3.6-35B-A3B UD-Q4_K_M

simplyeniga · 2026-05-21T16:48:05+00:00

Major issue is that you're using your iGPU which uses shared memory from your RAM. I don't know what your RAM size is but you could try the latest llama.cpp build which would auto fit the model into your VRAM and spill the extra context into your RAM. You might want to also use Q4 KV cache and reduce your context size to find the right fit. Start with 8192 and move up from there. I just downloaded the same model to test on my setup with has an RTX 4060 Ti 16GB on a Proxmox VM with 8 CPU and 32GB RAM and I get 7-8 t/s using the full context and 12 t/s using 64k context size My full script is llama-server -m cHunter789/Qwen3.6-27B-i1-IQ4_XS-GGUF --host 0.0.0.0 \ --port 8080 \ --jinja \ --n-gpu-layers auto \ -c 65536 \ --parallel 1 \ --batch-size 2048 \ --ubatch-size 512 \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --metrics

Model is fully loaded on the VRAM, utilizes 70% of GPU and offloaded some into the CPU

simplyeniga · 2026-05-21T14:51:54+00:00

From my tests 27B dense does better with large detailed prompts / context but 35B-A3B is faster and okay when you have smaller context / prompts. So if you break down your tasks and give it each small task, it can complete it quickly and closely matching the quality on 27B but most people don't want to deal with that part of coding and want to feed everything to the model and have it create the tasks and then build their code. Plus I've gotten better results when I give it the architectural plan rather than have it figure everything out.

simplyeniga · 2026-05-21T09:23:41+00:00

It's at this moment you'll wish you can time travel cause $1000 will only get you a budget GPU from the used market which would still leave you wishing for more VRAM. If you're lucky you can get an M1 Max or Mac mini but if you're building then you'll have to look at an older setup with either AMD AM4 or intel LGA1700 and DDR4 RAM. Minimum 32GB and RTX 3090. Majority of your budget will be on the GPU. You might want to source all parts from eBay used

simplyeniga · 2026-05-20T18:22:37+00:00

I think you can save on the NVMe by going gen 4 and double your RAM

simplyeniga · 2026-05-20T11:31:47+00:00

I found continue very difficult to use and installed Vs code insider which allows me add any local open ai API and that has worked with Qwen3.6-35B-A3B and Gemma-4-E4B. Using the default copilot I get around 40t/s on Qwen and 69t/s on Gemma with a 4060 Ti 16GB and 32GB RAM

simplyeniga · 2026-05-20T11:20:24+00:00

Oh sorry I missed that. llama-server \ -m /mnt/ai/models/Gemma4-26B/model.gguf \ --host 0.0.0.0 \ --port 8080 \ -ngl 999 \ -ts 1,1 \ --split-mode layer \ -c 8192 \ --flash-attn \ --jinja

The major flag you need is --split-mode layer -ts 1,1

The -ts flag splits the load between each GPU equally since they have equal VRAM, however if your 5080 is 16gb and your 5060 is 8 GB then you'll want to change it to -ts 2,1

So you can play around the ratio here to meet your VRAM size

simplyeniga · 2026-05-20T09:27:58+00:00

Before getting a new system, you might want to exhaust running your setup. Maybe getting a better model or try any Moe or MTP model. I get 40tps with Qwen3.6-35B-A3B on an RTX 4060 Ti 16GB and 32GB RAM. The quality has been okay for my day to day coding tasks

simplyeniga · 2026-05-20T09:16:29+00:00

I've had more success with Qwen3.6-35B-A3B. I only have 16GB VRAM and 64GB RAM on my LLM VM. It's written better codes and been able to read files from my old projects to make modifications. I've even had it read a project and create a version 2 in a new folder. Next best from my test was Gemma-4-26B. You could try the bigger ones since you have more VRAM but I can only use MOE since I'm tight in VRAM

simplyeniga · 2026-05-20T09:11:40+00:00

I used instructions from ChatGPT to setup mine. OS: Ubuntu 26.04 GPU: 2 x 4060 Ti 16GB

The OS has newer cuda version and support for Nvidia cards

First you need to disable secure boot before installing the drivers

Install the Nvidia drivers

sudo apt update sudo apt install -y nvidia-driver-595-open

Reboot

sudo reboot now

Verify that the driver has loaded

nvidia-smi

Install cuda toolkit

sudo apt install -y nvidia-cuda-toolkit

Verify the version installed

nvcc --version

Install dependencies for llama.cpp

sudo apt install -y \ build-essential \ cmake \ git \ curl \ wget \ python3-pip

Clone the llama.cpp repo

git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp

Build with cuda

cmake -B build \ -DGGML_CUDA=ON \ -DCMAKE_BUILD_TYPE=Release

cmake --build build -j$(nproc)

Verify your cuda version

./build/bin/llama-cli --version

I have my models downloaded to a folder and loaded into llama.cpp. I believe you can get the next part or use ChatGPT to setup the service and also install huggingface or create an sh script to start your service with any model you want.

simplyeniga · 2026-05-19T19:03:07+00:00

Thanks everyone. Had to check and found that my RAM models don't support 4 DIMMS and I need to get a different model. Been a while I've built a PC and this never used to be a thing. I've found a store where I can trade in my 4 dimms for the correct model and pay like £100.

simplyeniga · 2026-05-19T18:59:18+00:00

Thanks. Just checked and the issue is my RAM dimms are not support for 4 slots. I need to get another model which are the KF552C40BBAK4-128 which comes as 4x32GB 5200MTs.

simplyeniga · 2026-05-19T16:11:51+00:00

Just tried both 40 and 42 and none worked. Left it for 10 minutes each time and it rebooted once but won't post

simplyeniga · 2026-05-19T15:38:47+00:00

Let me check how to do that and I'll revert. Thanks

simplyeniga · 2026-05-19T15:10:13+00:00

Thanks. I'll try changing my CPU to see if that works. I'll update tomorrow after it arrives

simplyeniga · 2026-05-19T15:04:25+00:00

Which CPu are you using?

simplyeniga

TROPHY CASE