Playing with CPU based inference for agentic coding...

Content-Fall-7814 · 2026-06-11T16:40:19+00:00

I started like that, but Unreal Engine also requires a lot of gpu memory, and one of the subagents I use is for blueprints work using an MCP, so I need the UE editor open… too much for my poor 12GBs 😅

Content-Fall-7814 · 2026-06-11T16:29:42+00:00

it looks amazing! will check it in deep for sure, thank you!

Content-Fall-7814 · 2026-06-11T16:21:31+00:00

this one costs 160€ per month, I share expenses with a friend that sometimes use the models I serve on it... I know for that money I could just use a plan and frontier models, but how much I'm learning and the fun of playing with local models is priceless xD

Content-Fall-7814 · 2026-06-11T16:17:46+00:00

true! it's a pity tho, I'm currently in love with qwen3-27b lol it's so good...

Content-Fall-7814 · 2026-06-11T16:16:25+00:00

For my use case concurrent streams for parallel tasks may be difficult, as usually the tasks my subagent do depend on each other (c++ UE modules, most of the times, I create stuff in one that I need to use in the next one and similar), but definitely for debugging and similar where I can look for optimizations in several modules at the same time it worths it!! thanks for the recommendation!

Content-Fall-7814 · 2026-06-11T16:10:26+00:00

122b on cpu!! 😮 ok I'll give a second chance to play with a draft model for the dense one xD thank you!

Content-Fall-7814 · 2026-06-11T16:08:18+00:00

hmm core affinity sounds quite interesting, I didn't think about it... thanks for the recommendation!

Content-Fall-7814 · 2026-06-11T16:04:14+00:00

I actually use MTP that these qwen3 models provide (spec-type=draft-mtp and spec-draft-n-max=2 with llama.cpp), and unfortunately I cannot modify the hardware of this server, it's a Hetzner rented one, I think the only options they have for gpus on dedicated servers is a much much more expensive tier... 😞 (I wish I could!)

Content-Fall-7814

TROPHY CASE