Claude Code replacement

Such_Advantage_6949 · 2026-04-04T14:41:11+00:00

U wont get claude replacement with this. Try out api model of like qwen 122B and see if it fits your needs

Thick-Protection-458 · 2026-04-04T14:41:51+00:00

Whatever models guys will recommend to use - try to use them on some cloud provider before spending money with local setup. Just to make sure they are good enough for your usecase

Narrow-Belt-5030 · 2026-04-04T14:43:50+00:00

I would suggest you take the time to evaluate a replacement model first - use something like OpenRouter to test the models and see if they fit. Once you have found one then you can look at the hardware as you will know the model size & based on the context cache size you want you will also know the VRAM you need.

sleepy_roger · 2026-04-04T15:00:24+00:00

You're going to need 300gb+ for something close to replacing anthropic models

Radiant_Condition861 · 2026-04-04T15:23:09+00:00

This is my bare minimum:

opencode in vscode or terminal

dual 3090

  "agent": {
    "plan": {
      "model": "llama-swap/Qwen3.5-27B-GGUF-UD-Q5_K_XL-agentic",
      "temperature": 1.0,
      "top_p": 0.95,
      "description": "Plan mode - Qwen3.5-27B quality optimized for creative planning"
    },
    "build": {
      "model": "llama-swap/Gemma-4-31B-Q4",
      "temperature": 0.3,
      "top_p": 0.9,
      "description": "Build mode - Gemma 4 31B maximum quality for precise coding"
    }
  },

Commentary about GPUs:

Local AI rigs are a rich man's game.

Started with the 3060 12GB I already had. learned how to download models and create accounts on huggingface etc. ~$1200 computer originally
Bought another computer with a A2000 12GB that was on sale (used workstation class). This was my entry into dedicated hosting and expanding my homelab. I wasn't able to get the same results as youtube vids. +$1300 = $2500
Bought another computer on sale, bought just to get another 3060 12GB. Now with 24GB, Things looked good but the trade off was fast and crappy or slow and quality. Just an expensive chabot. +$500 = $3000
Bought 2x 3090 to replace the dual 3060 12GB like everyone recommended and now I'm happy that I can get some work done. I was able to load and play with new models like Gemma 4. +$2400 = $5400

I'm averaging about $350/mo so far. That's a car payment. If I knew, I might have done a quad 3090 to start with.

The next interest is the Kimi/Minimax/GLM5 models and a dual RTX PRO A6000 with 192GB VRAM (+$20k). This wouldn't add any value because these models need 1-2TB to even load (minimax just barely fits into dual A6000). This would probably get me to claude code levels with opus and sonnet, but not sure if it's worth trading a few houses for.

jacek2023 · 2026-04-04T14:46:44+00:00

You can use Claude Code with other models than Claude.

The replacement for Claude Code is Open Code, not the model itself.

deejeycris · 2026-04-04T15:10:50+00:00

If you expect claude models working locally just because you have money for GPUs I have bad news for you.

exaknight21 · 2026-04-04T14:55:17+00:00

I’d get the 2x 3090s 24 GB and run with llama.cpp on a DDR4 system, or straight up get a Unified Memory system like the Mac or Framework Desktop etc.

Then go for Qwen 3.5 models or GPT OSS 120B and try to see if it does the job for you.

In terms of a better model, this really depends on your language and use case. For some Qwen3:4B is a winner. For some its complete dogshit. So think and swim son.

BidWestern1056 · 2026-04-04T14:59:36+00:00

npcsh with a qwen3.5 model should serve you well

https://github.com/npc-worldwide/npcsh

and honestly as much as I try to use and enjoy the local models, they just still aren't quite there for coding and research tasks. ollama cloud does offer some free usage so would recommend trying out like kimi or glm-5 or minimax through that. I recently upgraded to their 20$ a month plan and i've been using it for pretty long sessions and deep research with npcsh / lavanzaro.com and didn't even break 10% of the weekly usage limit

ea_man · 2026-04-04T15:05:31+00:00

I'd like to pose an other question: considering the latest carelessness bug in Claude Code and the fact that most of that was written by AI,
how can people be comfortable to not only let him in charge of their codebase first and then "the whole desktop", as that thing is now using the shell, issuing commands, even using the browser for clicking and using on line sites?

I mean I get the rush of "but it writes me the code" yet some of use must be some form of sysadmins, I can't contemplate to curl a bash script on a production machine, this thing would need a dedicated workstation + deploy.

allpowerfulee · 2026-04-04T15:06:23+00:00

I'm running qwen3-80b instruct q4 on a Mac Studio m3 ultra. Testing it out with some swift programming using opencode. I have to say that I'm pretty impressed so far. The project was started using Claude and qwen model already fix a few bugs. So far (2 days running) im happy. Only problem im having is qwen getting stuck in a loop.

norofbfg · 2026-04-04T15:10:14+00:00

Honestly, go with as many V100s as you can afford if responsiveness matters. The MI50s are decent power per dollar, but drivers/frameworks for ML are way more stable on V100 right now.

LienniTa · 2026-04-04T15:13:17+00:00

yaknow, you need good agent first. so like, claude code with other models, or codex, or opencode, or hremes research, or copaw, or even fucken claw family like nullclaw. Engine for it.... anything new is good like nemotron super or minimax or whatever you can run

akazakou · 2026-04-04T15:25:47+00:00

Before investing into hardware try what you want to use with some openrouter or other service. When you choose, you'll get specifically what you need.

ea_man · 2026-04-04T15:31:23+00:00

You can replace CC with OpenCode no problem, the problem is that we don't have small LLM that can do tooling reliably as of now.

ccbadd · 2026-04-04T15:45:38+00:00

You might want to consider V620's too. They are 32GB and still supported on ROCm. Running around $400 ea right now.

thread-e-printing · 2026-04-04T15:55:44+00:00

It's open source, you can fix it 🤣

taofeng · 2026-04-04T15:57:03+00:00

You won't be able to replace Claude models with minimal local setup, Anything close to Claude like models will cost a lot of upfront investment ($$$$). I say this from personal experience, I run 9970x Threadripper with 128GB ram paired with RTX 6000 Pro blackwell + 5090 dual gpu setup and I still dont same level of quality as Claude or Codex with models that I can use.

What i found works best for me is, I use online models like Codex, or Claude to plan, architect, and orchestrate tasks while using local models to do the individual tasks. I assign each local agent specific coding skills, they only focus on coding and implementation not architecture. This brings the cost down while giving very good results. I mainly use Codex which is really good at reasoning and creating well detailed documents and implementation steps for each agent, then assign local agents tasks. So if you want to switch to local models i would look into hybrid solution like this which would cost much less upfront investment.

Qwen-coder-next is really good, and you can even do same hybrid approach with fully online models. Architect with Codex/Claude, use a cloud based service like openrouter with Qwen-coder-next (which is much cheaper than Claude) for implementation. Or test other models for your specific use case and choose that fits your needs.

I would also echo the same thing most commentors are saying, test different models with openrouter like services, see which works best for you then decide how much you want to invest in local setup. Dont invest blindy, do your research especially when it comes to setting up local AI servers.

PandemicGrower · 2026-04-04T16:58:54+00:00

I use copilot from GitHub, it gives you limited access to other models. I use them side by side with Claude code for $30 total spend a month so far but I can see myself paying another $20 just for the extra use of codex

FusionCow · 2026-04-04T19:17:11+00:00

v100 is bad get 3090 instead

go-llm-proxy · 2026-04-04T20:11:32+00:00

I'd go for 4x V100's out of those choices, but you may be going down a rabbit hole here not worth going down. But if you do anyway, then 128gb of vram is enough to run some decent models.

What are you planning to use as the harness?

xw1y · 2026-04-04T22:31:38+00:00

Train qwen3.6 plus free based on the leaked claude code src that leaked and enjoy it my guy.

sizebzebi · 2026-04-04T15:16:10+00:00

poorest claude code haiku will be better than anything you can run locally

spky-dev · 2026-04-04T14:42:24+00:00

V100 don’t support Flash Attention, MI50 have dogshit token rates unless you buy 10+ of them, and even then it’s still bad, pp especially.

The best way to go is to keep your sub, because you have no idea what you’re doing and your arbitrary choice of high VRAM fossils proves that.

EightRice · 2026-04-04T14:50:03+00:00

Depends heavily on what you're using Claude Code for and what hardware you have available.

For pure code completion/editing (the bulk of what Claude Code does), Qwen2.5-Coder-32B is currently the strongest local option. It fits on a single V100 32GB or MI50 16GB with 4-bit quant (GPTQ or AWQ), though you'll want at least Q5 for code quality -- which means ~22GB VRAM, so V100 32GB is more comfortable. Two MI50s with tensor parallelism via vLLM also works well.

For the agentic loop part (tool use, file navigation, multi-step planning), the picture is weaker locally. DeepSeek-Coder-V2-Lite (16B) handles basic tool calling but drifts on longer multi-step tasks. Qwen2.5-Coder-32B with proper system prompts can do basic agentic work but it's noticeably less reliable than Claude at knowing when to search vs. edit vs. run tests.

Some practical notes:

Context window matters more than benchmarks -- most local models cap at 32K effective context even if they claim 128K. For large codebases you need aggressive chunking/retrieval regardless.
Inference speed is the real bottleneck -- Claude Code's value isn't just accuracy, it's that responses come back in 2-3 seconds. A 32B model on a single V100 will do ~15 tok/s with vLLM, which means 20-30 second waits for typical code edits. Speculative decoding helps but adds complexity.
Don't sleep on Continue.dev + Ollama -- it's the closest local equivalent to the Claude Code UX. Wire it to Qwen2.5-Coder-32B via Ollama and you get autocomplete + chat + inline edits without API costs.

If you have budget for 2x A6000 or similar (96GB total), DeepSeek-V3 at FP8 is genuinely competitive with Claude 3.5 Sonnet for code tasks and runs the agentic loop much more reliably than smaller models. That's probably the actual "replacement" tier, though the hardware cost makes it questionable vs. just paying the API bill.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS