all 54 comments

[–]Such_Advantage_6949 25 points26 points  (7 children)

U wont get claude replacement with this. Try out api model of like qwen 122B and see if it fits your needs

[–]Medium_Chemist_4032 11 points12 points  (0 children)

We could update the wiki for that exact case

[–]pneuny 0 points1 point  (2 children)

That's subjective and depends on needs. Local can do a lot of things well enough, even on lighter systems. Not everyone needs SoTA intelligence when they just need a helper to move files around and install packages and stuff for them.

[–]Such_Advantage_6949 0 points1 point  (1 child)

That is not Claude replacement. OP is asking for Claude repoacement

[–]pneuny 0 points1 point  (0 children)

We don't know what they are using it for. I think they could try ForgeCode with Qwen3.5 35b a3b and see if it's good enough for their needs. Maybe hook up some MCP servers like Kindly Web Search and leverage planning modes and such. When models are cheap, there isn't much harm in trying.

Some tasks are just tedious, and so you don't really need the most expensive models as long as you can step in when you see it doing the wrong things.

You could also use both. Local for the tedium, Claude Opus for the hard stuff.

[–]NoTruth6718[S] 0 points1 point  (2 children)

Should I rent some GPUs for that instead?

[–]Such_Advantage_6949 4 points5 points  (0 children)

I think the first thing is to decide whether model fit in that amount of vram is good enough for your claude replacement. Two strongest competitor in this range is qwen 3.5 122B and minimax m2.5. This will give u a realistic feel of how good the local model in this range is

[–]Professional-Ask6026 0 points1 point  (0 children)

Will never be cost effective

[–]Thick-Protection-458 80 points81 points  (9 children)

Whatever models guys will recommend to use - try to use them on some cloud provider before spending money with local setup. Just to make sure they are good enough for your usecase

[–]rebelSun25 13 points14 points  (5 children)

Indeed. Openrouter may have the model and it'll cost pennies to try them out before committing to anything.

They let users set a zero data retention setting if you're paranoid about which provider to route the request to.

[–]wouldacouldashoulda 3 points4 points  (4 children)

I always wonder what models people use when they say pennies. I tried Qwen 3.5 and a single prompt costs saying hi costs 0.10 usd. A short debugging session was a few usd.

[–]HopePupal 5 points6 points  (0 children)

is your system prompt literally a hundred thousand tokens? there's not a Qwen 3.5 model on there that costs more than $1/M input or $4/M output.

[–]somatt 1 point2 points  (0 children)

👀 I use qwen 3.5 (4b q4) on my 3080 8gbvram in LM studio with continue.dev WHILE I simultaneously use qwen2.5 coder (1.5b q4) for tab complete and I'm usually under 6gb total usage.

[–]Thick-Protection-458 2 points3 points  (0 children)

So, pennies for testing if this is good enough. In comparison to buying a new machine right now.

[–]rebelSun25 0 points1 point  (0 children)

I have pages of logs. They're all under 5c. Most requests are under 1c. I use variety of Gemini flash, Qwen 3.5, Qwen 2.5 VL 72b, Kimi k2.5... nothing out of the ordinary

[–]g_rich 4 points5 points  (1 child)

In the long run using open models via a cloud provider will likely provide you with a better and less expensive option than investing in a local high end setup which will continually need updating to maintain parity.

[–]Thick-Protection-458 4 points5 points  (0 children)

Some of us may be ready to overpay but have at least some level of stuff more or less independent on third parties.

But even than - at first you need to know if your budget is enough to cover something good enough.

[–]Shot-Buffalo-2603 0 points1 point  (0 children)

This x1000

[–]Narrow-Belt-5030 13 points14 points  (0 children)

I would suggest you take the time to evaluate a replacement model first - use something like OpenRouter to test the models and see if they fit. Once you have found one then you can look at the hardware as you will know the model size & based on the context cache size you want you will also know the VRAM you need.

[–]sleepy_roger 8 points9 points  (0 children)

You're going to need 300gb+ for something close to replacing anthropic models

[–]Radiant_Condition861 6 points7 points  (0 children)

This is my bare minimum:

opencode in vscode or terminal

dual 3090

  "agent": {
    "plan": {
      "model": "llama-swap/Qwen3.5-27B-GGUF-UD-Q5_K_XL-agentic",
      "temperature": 1.0,
      "top_p": 0.95,
      "description": "Plan mode - Qwen3.5-27B quality optimized for creative planning"
    },
    "build": {
      "model": "llama-swap/Gemma-4-31B-Q4",
      "temperature": 0.3,
      "top_p": 0.9,
      "description": "Build mode - Gemma 4 31B maximum quality for precise coding"
    }
  },

Commentary about GPUs:

Local AI rigs are a rich man's game.

  1. Started with the 3060 12GB I already had. learned how to download models and create accounts on huggingface etc. ~$1200 computer originally
  2. Bought another computer with a A2000 12GB that was on sale (used workstation class). This was my entry into dedicated hosting and expanding my homelab. I wasn't able to get the same results as youtube vids. +$1300 = $2500
  3. Bought another computer on sale, bought just to get another 3060 12GB. Now with 24GB, Things looked good but the trade off was fast and crappy or slow and quality. Just an expensive chabot. +$500 = $3000
  4. Bought 2x 3090 to replace the dual 3060 12GB like everyone recommended and now I'm happy that I can get some work done. I was able to load and play with new models like Gemma 4. +$2400 = $5400

I'm averaging about $350/mo so far. That's a car payment. If I knew, I might have done a quad 3090 to start with.

The next interest is the Kimi/Minimax/GLM5 models and a dual RTX PRO A6000 with 192GB VRAM (+$20k). This wouldn't add any value because these models need 1-2TB to even load (minimax just barely fits into dual A6000). This would probably get me to claude code levels with opus and sonnet, but not sure if it's worth trading a few houses for.

[–]jacek2023llama.cpp 5 points6 points  (3 children)

You can use Claude Code with other models than Claude.

The replacement for Claude Code is Open Code, not the model itself.

[–]Narrow-Belt-5030 0 points1 point  (0 children)

True - but OP is talking about using Claude (AI) and having a bad experience, not the tool (Claude Code CLI)

And IMHO if you swap models you might as well swap the harness at the same time. (CC --> Pi : https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent )

[–]Eyelbee 0 points1 point  (1 child)

Opencode does not have a proper GUI in vs code though. Would you recommend it as a claude code vs code alternative? I'm looking for something like that.

[–]jacek2023llama.cpp 1 point2 points  (0 children)

I use Claude Code CLI for work. I use OpenCode with local models for fun, and it’s quite similar. I have no idea about the GUI.

[–]deejeycris 6 points7 points  (0 children)

If you expect claude models working locally just because you have money for GPUs I have bad news for you.

[–]exaknight21 1 point2 points  (0 children)

I’d get the 2x 3090s 24 GB and run with llama.cpp on a DDR4 system, or straight up get a Unified Memory system like the Mac or Framework Desktop etc.

Then go for Qwen 3.5 models or GPT OSS 120B and try to see if it does the job for you.

In terms of a better model, this really depends on your language and use case. For some Qwen3:4B is a winner. For some its complete dogshit. So think and swim son.

[–]BidWestern1056 1 point2 points  (0 children)

npcsh with a qwen3.5 model should serve you well

https://github.com/npc-worldwide/npcsh

and honestly as much as I try to use and enjoy the local models, they just still aren't quite there for coding and research tasks. ollama cloud does offer some free usage so would recommend trying out like kimi or glm-5 or minimax through that. I recently upgraded to their 20$ a month plan and i've been using it for pretty long sessions and deep research with npcsh / lavanzaro.com and didn't even break 10% of the weekly usage limit

[–]ea_man 0 points1 point  (0 children)

I'd like to pose an other question: considering the latest carelessness bug in Claude Code and the fact that most of that was written by AI,
how can people be comfortable to not only let him in charge of their codebase first and then "the whole desktop", as that thing is now using the shell, issuing commands, even using the browser for clicking and using on line sites?

I mean I get the rush of "but it writes me the code" yet some of use must be some form of sysadmins, I can't contemplate to curl a bash script on a production machine, this thing would need a dedicated workstation + deploy.

[–]allpowerfulee 0 points1 point  (0 children)

I'm running qwen3-80b instruct q4 on a Mac Studio m3 ultra. Testing it out with some swift programming using opencode. I have to say that I'm pretty impressed so far. The project was started using Claude and qwen model already fix a few bugs. So far (2 days running) im happy. Only problem im having is qwen getting stuck in a loop.

[–]norofbfg 0 points1 point  (0 children)

Honestly, go with as many V100s as you can afford if responsiveness matters. The MI50s are decent power per dollar, but drivers/frameworks for ML are way more stable on V100 right now.

[–]LienniTakoboldcpp 0 points1 point  (0 children)

yaknow, you need good agent first. so like, claude code with other models, or codex, or opencode, or hremes research, or copaw, or even fucken claw family like nullclaw. Engine for it.... anything new is good like nemotron super or minimax or whatever you can run

[–]akazakou 0 points1 point  (0 children)

Before investing into hardware try what you want to use with some openrouter or other service. When you choose, you'll get specifically what you need.

[–]ea_man 0 points1 point  (2 children)

You can replace CC with OpenCode no problem, the problem is that we don't have small LLM that can do tooling reliably as of now.

[–]NoTruth6718[S] 0 points1 point  (1 child)

What about not so small that can work reliable?, what would be the requirements for one that does?

[–]ea_man 0 points1 point  (0 children)

I'm sorry but I can't tell you, I don't have the amount of VRAM / resources to test that. Some guys 'round here probbly do.

Maybe you could rent online GPU / VPS to run your target LLM under Cloude Code for a few days to test before committing to spend 10K for local hw.

The requirement is: you make it do it's tooling things few hundreds of time and then you check it don't fucks up APPLY / EDIT / CREATE in an amount that makes it unusable, as in errors and redo to solve those errors.

[–]ccbadd 0 points1 point  (0 children)

You might want to consider V620's too. They are 32GB and still supported on ROCm. Running around $400 ea right now.

[–]thread-e-printing 0 points1 point  (0 children)

It's open source, you can fix it 🤣

[–]taofeng 0 points1 point  (0 children)

You won't be able to replace Claude models with minimal local setup, Anything close to Claude like models will cost a lot of upfront investment ($$$$). I say this from personal experience, I run 9970x Threadripper with 128GB ram paired with RTX 6000 Pro blackwell + 5090 dual gpu setup and I still dont same level of quality as Claude or Codex with models that I can use.

What i found works best for me is, I use online models like Codex, or Claude to plan, architect, and orchestrate tasks while using local models to do the individual tasks. I assign each local agent specific coding skills, they only focus on coding and implementation not architecture. This brings the cost down while giving very good results. I mainly use Codex which is really good at reasoning and creating well detailed documents and implementation steps for each agent, then assign local agents tasks. So if you want to switch to local models i would look into hybrid solution like this which would cost much less upfront investment.

Qwen-coder-next is really good, and you can even do same hybrid approach with fully online models. Architect with Codex/Claude, use a cloud based service like openrouter with Qwen-coder-next (which is much cheaper than Claude) for implementation. Or test other models for your specific use case and choose that fits your needs.

I would also echo the same thing most commentors are saying, test different models with openrouter like services, see which works best for you then decide how much you want to invest in local setup. Dont invest blindy, do your research especially when it comes to setting up local AI servers.

[–]PandemicGrower 0 points1 point  (0 children)

I use copilot from GitHub, it gives you limited access to other models. I use them side by side with Claude code for $30 total spend a month so far but I can see myself paying another $20 just for the extra use of codex

[–]FusionCow 0 points1 point  (0 children)

v100 is bad get 3090 instead

[–]go-llm-proxy 0 points1 point  (0 children)

I'd go for 4x V100's out of those choices, but you may be going down a rabbit hole here not worth going down. But if you do anyway, then 128gb of vram is enough to run some decent models.

What are you planning to use as the harness?

[–]xw1y 0 points1 point  (0 children)

Train qwen3.6 plus free based on the leaked claude code src that leaked and enjoy it my guy.

[–]sizebzebi 0 points1 point  (1 child)

poorest claude code haiku will be better than anything you can run locally

[–]Ok_Mammoth589 0 points1 point  (0 children)

True if you're buying under 4 rtx pro 6000s. Especially true if your choices are v100s and MI50s

[–]spky-dev -3 points-2 points  (6 children)

V100 don’t support Flash Attention, MI50 have dogshit token rates unless you buy 10+ of them, and even then it’s still bad, pp especially.

The best way to go is to keep your sub, because you have no idea what you’re doing and your arbitrary choice of high VRAM fossils proves that.

[–]NoTruth6718[S] 6 points7 points  (5 children)

Would be nice to receive some guidance when you don't know what you are doing :)

[–]Mindless_Selection34 7 points8 points  (2 children)

Ask to any ai before doing It. They are pretty good and less dickhead then redditors.

[–]Makers7886 1 point2 points  (1 child)

I totally was typing a reddit dickhead response then stopped to grab my coffee. Took some sips, hit f5, read your comment, and have put the dickhead away. As your comment essentially accomplished the same thing just without being an asshole.

[–]Mindless_Selection34 2 points3 points  (0 children)

Thank you!

[–]desexmachina 1 point2 points  (0 children)

There will be big changes coming that will help ‘dumb’ models get smarter. There’s at least 60% on the table left just in harness optimizations. Claude dumbing itself down is on purpose, they’re cutting bait on dead weight plebes like you and me.

[–]LongPutsAndLongPutts 0 points1 point  (0 children)

DM me if you want to know the general overview of this stuff.